Friday, November 5, 2010

Asynchronous Accessibility?

Our current gecko accessibility engine (GAE) happily serves chrome and content information through synchronous desktop accessibility API such as MSAA/IA2. For example a screen reader makes a synchronous IPC call over COM via MSAA/IA2 into our process where we (GAE) grab the information requested and return it (pass it back).

Firefox will almost certainly be moving web content rendering out of the chrome process. Communication between chrome and content processes will be done asynchronously because users like their browsers to be responsive. Desktop accessibility API is implicitly synchronous. In addition to the chrome (browser UI) process, content processes contain a lot of juicy information (i.e DOM) that needs to be served through desktop accessibility API.

What to do?

Hypothesis 1:
Forward messages between the chrome and content processes.

H1 conclusion:
Fail. Google tried this and it is too slow.

Hypothesis 2:
Cache all the chrome and content information in one process.

H2 conclusion:

Google is now doing this with some success. If a screen reader is detected, they cache the accessible DOM tree(s) in the main browser process. Cache latency can lead to screen readers getting stale information sometimes, but it is expected that firing desktop update events will mitigate this.

Hypothesis 3:

Create asynchronous desktop API.

H3 conclusion:
Yeah, this might be a pipe dream.

I am currently thinking about something like H2, but that would allow assistive technology to transition to H3. In terms of pictures, here is my thinking:

accessibility process diagramThis minimalist figure shows boxes for chrome, and content processes with asynchronous connections to box that represents an accessibility process called "Async Accessibility" (needs a better name). This box butts up to another box that essentially represents our current gecko accessibility engine, which in turn has a synchronous connection to assistive technology (over desktop API).

If we then begin providing desktop access to "Async Accessibility", my hope is that progressive AT could begin using this API when they are ready. As I write this I admit I have serious doubts this would actually happen, but I want to get this thinking out there.

AT talking async to a11y serverThis figure shows AT using asynchronous API to access "Async Accessibility" through some as yet undefined API and usage pattern.

In some ways H2 and H3 are similar, with H2 being more straightforward. I haven't attempted to estimate the engineering effort for either approach but there is a good chance we'll end up with something closer to H2. It all depends on resourcing and time, and how quickly our desktop firefox moves to multi-process content and how long we might support a single process mode.

I'd like to thank Josh Matthews (jdm) for sanity checking this before I posted it. Feedback is welcome. I'm hoping to iterate on this and to ultimately develop a solid implementation plan by mid December.

Thanks for reading.

16 comments:

Kyle Huey said...

Idk much about a11y, but why can't the external software talk to both processes directly? I'd be interested to know how IE solved this problem.

David Bolter said...

@Kyle, they certainly could talk to each process directly. I have lots of diagrams I haven't posted ;)

Can you elaborate?

dmazzoni said...

@Kyle, having screen readers talk to both processes would be nice but it'd require all new APIs, which could take years for all tools to get adopted.

@David, the problem I see with H3 is that most Windows screen readers like to load the entire page contents into a virtual buffer, and doing this via an asynchronous API would be too slow.

We'd be happy to talk with you about how we're implementing this in Chromium. Here are our email addresses:

dmazzoni@chromium.org
ctguil@chromium.org
dtseng@chromium.org

David Bolter said...

@dmazzoni, it makes a lot of sense to align our implementations. Email coming soon!

Mike Gorse said...

Thanks for posting this, David.

As you probably remember, atk now has a plug/socket API that can be used to embed out-of-process accessibles, so, on the Linux side, if both processes expose information over atk and the container process properly embeds the root accessible of its child process, then an AT will talk to both processes if AT-SPI2 is used. I'm not really sure how things work on the Windows side, though, so of course you'll have to decide what makes sense to work well with all of your platforms.

Round-trip calls over dbus are slow, and the dbus developers recommend that they be kept to a minimum. AT-SPI2 tries to do this to some extent by caching data on the AT side, relying on the application to send signals when children are added, states change, etc. There can still be a lot of round-trip calls, however. I was talking with Joanie Diggs about this at one point when we were in Spain; she was pointing out that, if AT-SPI events contained the data that Orca needed, then Orca would not need to respond to events by making calls to query for information. I think that it would be good to have some kind of API by which an AT could describe the type of information that it wants from applications, and so the type of information that applications proactively send could become dynamic and dependent on what the AT needs. So we could have some kind of hybrid with hopefuly most of the information being pushed asynchronously and synchronous calls being available to fill the gaps but kept to a minimum.

It would really be good to have a discussion about improving the a11y APIs; I think that the Linux Foundation a11y working group would be a good vehicle for this, particularly when we want to keep ATk/AT-SPI in synch with IA2.

Jim Chen said...

I worked on Fennec during my internship this past summer, and we ran into the exact problem with Input Method Editors (IME) - the system API was synchronous but we needed to access content text fields asynchronously.

Right now we are using the caching approach and fortunately the only thing we need to cache is the text content of the focused text field, but in the future we need to cache screen coordinates (and possibly other things).

I'd love to follow a11y's discussion on this. I think our problems have a lot in common. Thanks!

David Bolter said...

@Mike, I couldn't agree more with respect to some kind of registration for a11y information, so that it wasn't all or nothing.

@Jim, I will loop you in thanks! I expect discussion to ramp up in a week or two. Did you encounter bugs or other pain with the cache approach?

Jhon said...

Even if IE solve this problem still I am not going to use it until there is firefox.

Magento Themes

tania said...

@ Jhon why so much love for firefox, I think because its competitor MS IE.

- Tanya
Web Design Firm

Jim Chen said...

@David, sorry for the late reply.

We had to deal with the basic stuff such as cache efficiency and dealing with cache misses - since there's no way to block chrome and fetch from content, we fail when we miss the cache.

Another big set of problems had to do with synchronization. Because the chrome cache is updated regularly by messages from content, it is possible for content and chrome to be out of sync briefly, while the update message is on its way. The way IME is set up, it kind of assumes chrome is always up to date, and that resulted in bugs such as 599550. Our solution made sure that stale update messages are discarded on the chrome side (see discussion in bug 599550).

Feel free to ping me (jchen) on #mobile. Thanks!

David Bolter said...

Just a quick note that discussion on this topic got delayed... probably until the new year.

James Teh said...

Hi David,

I missed this post. :)

I do agree in principle to some extent. In fact, async would also solve freezing problems in ATs. I've even been trying to think of ways to make parts of NVDA asynchronous to avoid said freezing. However, there are fundamental issues with an async a11y API. The issue with virtual buffers has already been mentioned. Let's even move beyond that and pretend that we've moved to an approach where the objects are accessed on demand, as done by VoiceOver and Orca. (It's worth noting that VoiceOver uses object navigation while Orca tries to provide a flat representation similar to that afforded by virtual buffers. We're considering the former for NVDA for various reasons beyond the scope of this discussion.)

The problem is that the interaction between a user and an AT is rather synchronous. The user requests information, the AT responds to those requests in order. Also, we often need to know about related objects, so providing all of the information about a single object isn't enough. This is further complicated by the fact that an AT is in some ways stateful, not stateless. For example, we track the current focus and review objects, as well as caching some information about those objects for a short period of time.

That said, there may be ways to make it synchronous to the user and AT core while still having the technical advantages of an async API. An AT could wait on an async request like it was synchronous, while still being able to abort that wait due to a timeout or more important information coming in such as a focus change. Add Python greenlets or stackless Python (which allow multiple execution stacks and the ability to switch between them) and I can come up with all sorts of scary/cool/wonderful ideas.

James Teh said...

Oh, and one thing to watch for with h2:
Chrome now caches the tree and updates it when it changes, as you said. However, instead of updating the existing accessible objects, it seems to create new ones with the same unique IDs. The result is that if you are holding a reference to an accessible object (which NVDA must for the focus, focus ancestors and current review object) and it gets updated, you can't use your reference anymore; that object is now dead/invalid. The practical upshot is that whenever a dynamic update occurs in Chrome, the focus and the whole buffer dies. Please don't make this mistake in Firefox. :)

Extreme News said...

You got numerous positive points there. I made a search on the issue and found nearly all peoples will agree with your blog. Agen Sbobet Sbobet Ibcbet Casino Sbobet Pasar Bola

Fastbet said...

Thank you for sharing to us.there are many person searching about that now they will find enough resources by your post.I would like to join your blog anyway so please continue sharing with us. Agen Judi Online 338A Sbobet Casino Agen Sbobet

Fastbet said...

I think this is one of the most important information for me. And i am glad reading your article. But want to remark on some general things, The site style is ideal, the articles is really great. Judi Bola Agen Sbobet