
This next image shows a blown up view of the controls that appear when the mouse hovers over the video. I've also hovered over the volume control here to show reveal the volume slider.

Are you a keyboard user?
We have you covered. You can tab to the video element. The controls are not shown but the you can manipulate the video using some intuitive keystrokes such as arrowing left and right to go back and forward, space to toggle play and pause, and up and down arrows to control volume etc. Sighted keyboard users can enjoy uncluttered interaction with the video, while screen reader users can of course enjoy the same interaction regardless of visual clutter.
We still have some concerns:
1. Discoverability. Once a user has tabbed to a video, it is difficult to tell that the video has focus and there is nothing indicating that the video is keyboard controllable.
2. Feedback. The feedback after a user action is not as rich as the feedback when using the controls. For example, pressing right arrow to advance the video doesn't tell you how far ahead we went, or where we are in the overall length of the video.
An Idea...
Keep the current functionality but add a secondary keyboard interaction model. Once a user has tabbed to the video element then the video is directly controlled via the existing keystrokes. If the user hits tab again, the controls appear, and the first control is focused. A regular keyboard interaction model ensues for the controls (tab navigation, and per control keyboard manipulation). Tabbing past the last control leaves the video element entirely, moving to the next element in the document tab order.
Pros: Discoverability is solved. Feedback is solved.
Cons: It increases the number of items in the overall document tab order. Additional source code is required.
Thoughts?