Accessibility Changes in GeckoView 64

Eitan Isaacson Wed, 28 Nov 2018 10:15:28 -0800

*I originally sent this to some team lists. Cross-posting this to
mobile-firefox-dev, so it is fully public.*
Accessibility Changes in GeckoView 64


Hello! We are having some major changes come in to our accessibility API
support in GV 64/65. Since this stuff straddles two modules, I figured it
would be good to write what may be the start of a design doc, or the very
least a big picture brain dump. Sorry if its a wall of text, maybe if I
recorded this as a podcast you would listen? I hope this is helpful to
someone.
tl;dr

64 introduces a major overhaul in our accessibility support. The goal is to
provide a WebView-camparable API, and support multiple usecases. Android’s
accessibility API is synchronous on the UI thread, we put in some
interesting caching measures to make sure we never block.
------------------------------

In the last few months, I have been redoing a lot of our accessibility
support in Fennec and GeckoView. Most of this got fleshed out in Bugzilla.
Since only a few reviewers saw a piecemeal picture of these changes, I
wanted to lay out here what has changed, why, and what has improved. The
changes are split among two modules: GeckoView (mobile/) and accessibility
(accessible/), and so were the reviewers. My aim here is to give a better
picture of what’s up.
The time before

When Fennec was first released as a native application (after the rewrite
from the XUL app), Gingerbread was the largest portion of our install base.
Gingerbread had very little accessibility support to begin with, and had a
minimal API with no concept of node hierarchies and only simple
accessibility events, with AccessibilityEvent.TYPE_VIEW_FOCUSED being the
most used primitive. In Fennec, we decided to use the platform’s a11y API,
and tool it in a way that would optimize the screen reader user’s
experience. We did this mostly through the addition of a JS module that
would pass screen reader tailored events back to the platform. By “screen
reader tailored”, I mean that they would account for things that screen
reader users cared about, like announcing roles (eg. ‘read more, link’),
and context (eg. ‘first item in list with 10 items’). In a typical platform
it is the screen reader’s job to deduce the context of the user’s cursor
position and synthesize a helpful utterance, in Fennec we were essentially
spoon feeding the screen reader with helpful utterances.

In addition to the tailored output, we implemented other features that are
traditionally left to the assistive tech We brewed our own “accessibility
cursor” that followed different navigation rules and allowed the user to
explore the page with a keyboard (slider keyboards were common), and later
with gestures. We also rendered a rectangle around the current
accessibility cursor’s position.

The JS layer acted as an intermediary that encapsulated the Gecko
accessibility tree structure and hid it from the platform, while at the
same time providing useful output for screen reader interactions. This
allowed us to offer advanced features that were not available in
Gingerbread, and were in their infancy in Ice Cream Sandwich, things like
explore by touch, advanced gestures, and navigating pages by granularity.

When Mozilla pivoted to Firefox OS, we took our Android js layer, and
expanded it to be a standalone screen reader. Since we already assumed many
roles of a screen reader in Android (utterance generation, gesture
detection, tree navigation, etc.), we already had what we needed, we just
sent our utterances directly to speech synthesis instead of an android
‘focus’ accessibility event.
The Now

A jury rigged solution for TalkBack has served its purpose, and we are in
need of a generalized accessibility API that can handle any number of use
cases. Since 2011 some new realities have emerged:

   1. Android’s accessibility API has evolved and is fully featured,
   including accessible object hierarchy support, many more event types,
   native support for tables and collections, advanced text support, etc.
   2. TalkBack has matured and gained many more features and advanced
   gestures.
   3. Google’s accessibility suite has expanded and includes other
   accessibility services besides the TalkBack screen reader, like switch
   access, and tap to speak.
   4. Other OEMs package their own specialized screen reader experiences,
   like Amazon’s VoiceView or Samsung’s Voice Assistant.
   5. Developers who use GeckoView expect to have UIAutomator support. This
   is a test ui automation framework that leverages the a11y API.
   6. Password manager apps have grown to rely on accessibility APIs for
   form autofill (while Google addressed this specific usecase with a new
   autofill API introduced in API 26, many older devices still use the a11y
   API).

Instead of the JS intermediary, we will need to expose the accessibility
tree and its functionality to the platform and allow a diverse set of
accessibility services to interface with it. Just like we do on desktop
platforms.

In the first set of patches in bug 1479037, I did just that. The changes in
bug 1479037 remove a lot of the JS bits we used for AccessibilityEvent
generation, and replaced it with proper proxying of gecko accessibility
events, and the exposure of our full gecko accessibility tree. This gives
accessibility services, like TalkBack, and form autofill apps enough
information to be useful. This change was also a first naive pass, and
introduced some major performance regressions.
Blockage

The android API call for retrieving an accessible node object from a custom
view (createAccessibilityNodeInfo) is synchronous, and is called on the UI
thread. In gecko accessibility, we rely heavily on DOM and many of the
calls require us to be on the main gecko thread. So in order to fully
populate an Android AccessibleNodeInfo object, we need to synchronously
call into the gecko thread from the UI thread. That’s not good. What makes
it really bad is e10s. If e10s is in play, we need to perform a handful of
synchronous IPDL calls into a content process to get all the info we need
to populate a node. The UI thread remains blocked until a full accessible
node is returned.

With all of that horribleness, I still decided to start with a naive
implementation to see how bad it actually was from a performance perception
standpoint. It’s bad. Not only because of the issues above, but because
accessibility services (and often core Android components) assume
traversing accessible trees is cheap, so they do it very often. This makes
things like scrolling janky - an autofill service will poll the entire tree
for input fields on each scroll event.

Early on when I looked into how Chrome was managing this, it became clear
that they had a full, thread safe, cache of all accessibles in all open
tabs in the top-level process. So this brings up the solution and loaded
issue of caching..
Caching

The design debates around e10s and accessibility are long going. In the
runup to e10s, we needed to make some choices about how we were going to
support it in a performant way on desktop. Assistive technologies
(accessibility services in Android, or clients in Windows), see each
application as a unified tree of accessible nodes. For us to represent a
unified tree, we would need to query the content processes synchronously
for content accessibility nodes. Chrome has solved this issue by caching
the entire accessible tree of every tab in the top-level process (some more
background: it took them several years to develop a robust caching scheme,
it is hard and still buggy, in that while they had no accessibility API
support). On Windows, we decided to use COM redirection tricks. They
essentially allow assistive technologies to query information directly from
the content processes while still maintaining the illusion of a unified
tree.

So what were we to do? Do like Chrome and Cache All the Things? Yes and no.
I have experimented with two cache types that give us just enough
information to be useful, but are lightweight enough that they can be
frequently invalidated and not become stale. By default, GeckoView’s
accessibility cache is *masked*, meaning a node will exclude non-cached
child nodes from its child list, so an accessibility service would only
know about the cached ones, and will not query us for nodes outside the
cache. We currently serve up two *unified* caches, meaning we query both
caches and get the most recent attributes from each.

With these two caches we hope to get the best of both worlds, have
immediate access to accessible nodes with just enough information to be
useful, while keeping the cache current and uncorrupted. This will allow us
to avoid the pitfalls of a maintaining a long-term cache that would need to
stay in sync via tree deltas, and can potentially diverge. In short, our
cache might not always reflect the true accessibility state %100 of the
time, but at least it is short lived and gets updated often enough that
users won’t perceive it.

Here are the two caches:
1. Viewport Cache

The viewport cache consists of all accessible nodes that are visible in the
GeckoView viewport. It is fully repopulated on every scroll, and on each
tree mutation. it is throttled so it won’t happen more than twice a second.
The information on each node includes its screen bounds, its role
(AccessibilityNodeInfo.getClassName()), and its state
(AccessibilityNodeInfo.is*())). The information in this cache has proven to
be enough for form autofill services, and switch access. Both of which
don’t require low latency cache updates, or full text contents support.
2. Focus Path Cache

The focus path cache gets replaced on each focus or accessibility focus
change. It consists of the focused node and its parentage. It has a higher
fidelity of information on each node compared to the viewport cache, and
contains text, localized role descriptions, as well as range/collection
info. This cache primarily serves screen readers like TalkBack that need a
lot of information about a single focused node, and are primarily
focus-driven.
Cache updates

Generally, the caches are fully replaced each time a cache-relevant event
happens (eg. focus invalidates the focus cache, scroll invalidates the
viewport cache). But there are several events that may mutate a cached
node. Selected or checked events will change the selected or checked state
of a cached node. Focus or accessibility focused events will update the
stored ID of the focus/accessibility focused node.
When not to cache

By default, the cache is on. Any cache misses don’t fall back on the slow
blocking path in Gecko/e10s, instead the root node is returned. So there is
no risk of ever getting into a blocking call. When a developer wishes to
test their app with UIAutomator, they may turn off caching and access the
canonical Gecko accessibility tree directly. That will give them full
access to all its of the tree, and their resource IDs (aka DOM node IDs).
Comparison with Chrome’s implementation

The advantage that this scheme has over Chrome’s is that it is leaner. It
loads faster and it is theoretically lighter on memory (its not really
optimized for storage yet since it uses GeckoBundle string key and value
pairs). I’m proud to say that from a TalkBack performance perspective we
are doing just as good, and often better, than Chrome. There are some
issues with responsiveness that are out of our control, for example gesture
detection and speech synthesis have constant latencies.
Schedule

The naive full tree implementation, with all the horrible blocking landed
in 64.

In the meantime caching was implemented and landed in Nightly. I wish the
timing was better on that. We hope to have it uplifted to Beta this week.
Further Work

Now that the main foundation is set there are other items and features we
can do to robustify, and further this accessibility support. Here is a
partial unordered list:

   - A live regions cache for better screen reader experiences
   - Tuning our support for switch access
   - Tuning our support for braille
   - Adding table support
   - Adding heading level support
   - Further performance tweaks
   - Look into Amazon’s VoiceView and Samsung’s Voice Assistant
   - Look into tap to speak support
   - Look into d-pad support (for FireTV remotes)


Thanks for reading this through!!

_______________________________________________
mobile-firefox-dev mailing list
mobile-firefox-dev@mozilla.org
https://mail.mozilla.org/listinfo/mobile-firefox-dev

Accessibility Changes in GeckoView 64

Reply via email to