E10S performance & stability metrics (January 2016 edition)

Vladan Djeric Mon, 18 Jan 2016 16:34:51 -0800

This is a high-level overview of the Perf team's efforts so far to certify
e10s performance & stability -- these metrics will have to be as good as
those of single-process Firefox before e10s can be released as the default
configuration on the Release channel.
This post is addressed to the general Firefox developer audience and skimps
on details, so please feel free to get clarification on the
mozilla.dev.platform thread or talk to us on #perf.


1. PERFORMANCE

1.1) Overview

We have been using  Talos & Telemetry data to understand Firefox
performance in its e10s & non-e10s configurations. Talos numbers have
mostly improved with e10s, but some Telemetry responsiveness measures
*seem* to suggest a non-negligible e10s performance regression.

Our approach is to run A/B experiments on the Aurora & Beta channels using
the TelemetryExperiments infrastructure. For each profile in the
experiment, the experiment code randomly configures Firefox to run in e10s
or non-e10s mode. We have run these experiments on Aurora 43 and Beta 44.

Currently, we are mostly focused on studying general measures of
responsiveness in these experiments:

* Frequency of main-thread events lasting longer than 127ms: this is
tracked by the Background Hang Reporter code (aka BHR) and reported via
Telemetry
* UI event processing lag: EVENTLOOP_UI_ACTIVITY_EXP_MS histogram probe
* Frame painting delay: FX_REFRESH_DRIVER_CHROME_FRAME_DELAY_MS probe,
FX_REFRESH_DRIVER_CONTENT_FRAME_DELAY_MS
probe, REFRESH_DRIVER_TICK probe, and other histogram probes

1.2) Background Hang Reporter data

Of these general measures of responsiveness, the BHR measurement is the
most useful, as it also captures pseudo-stacks from janky events, allowing
us to attribute jank to various sources (extensions, plugins, various
Firefox features, web page scripts, etc)

Unfortunately, the BHR Telemetry data from both Aurora 43 & Beta 44
experiments suggests that e10s is jankier than non-e10s. This holds true
for profiles with & without extensions.

However, we have identified bugs causing inaccuracies in BHR reporting and
we are working to imporve BHR as well as other Telemetry performance
measurements. We have even built an extension to visualize BHR's jank
detection: https://github.com/chutten/statuser

In general, as we evaluate e10s performance using A/B experiments, we also
validate and improve the performance probes in parallel.

1.3) Other measures of performance

We are also analyzing data from many other performance probes: startup &
shutdown time probes, page-load time, scrolling smoothness, tab animation
smoothness, memory usage, shutdown hangs, and many others.

We will also use the BHR Telemetry data to generate a
whitelist/blacklist/graylist of addons based on how often they jank e10s
Firefox.

1.4) Findings

You can see all our analyses of experiment data here:

* Beta 44 experiment analyses:
https://github.com/vitillo/e10s_analyses/tree/master/beta
* Aurora 43 experiment analyses:
https://github.com/vitillo/e10s_analyses/tree/master/aurora

If you're interested in these analyses, I recommend starting with this
analysis of e10s vs non-e10s performance on Beta 44 experiment profiles
*without* any extensions installed:

https://github.com/vitillo/e10s_analyses/blob/master/beta/addons/e10s_without_addons_experiment.ipynb

Start at the section "1. Generic stuff"; the code at the beginning of the
analyses is just analysis boilerplate.


2. STABILITY

Socorro does not allow us to easily compare e10s vs non-e10s crash rates
from experiments. Luckily, Telemetry now reports on Firefox crash events as
well, and Telemetry data can be analyzed fairly easily using Spark.
Our analyses of the Telemetry crash reports from the Beta 44 A/B experiment
showed that e10s is significantly crashier than non-e10s. This was true for
profiles with & without extensions.

https://github.com/poiru/e10s_analyses/blob/beta/beta/e10s_crash_rate_without_extensions.ipynb

There were known issues with a11y blacklisting in e10s code, so we expect
the stability measures to improve during the next A/B experiment.


3. FUTURE WORK

* There will be another A/B experiment on Beta 45
* BHR and other responsiveness measures are still being improved
* We will invest more in other measures of performance after the BHR
responsiveness deficit in e10s is better understood
* We will soon generate a preliminary extension
whitelist/blacklist/graylist for e10s based on experiment data from Beta 45
or 46

You can follow our progress at this meta bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=e10s-measurement
You can see the full plan for evaluating e10s performance & stability here:
https://docs.google.com/document/d/1TyE0BehzYhii3qfmcrfjXlRJL64CcJk0B4Voup4Q0Pg/edit#
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

E10S performance & stability metrics (January 2016 edition)

Reply via email to