[blink-dev] Intent to Experiment: Load common payloads from privacy-preserving single-keyed cache

Daisuke Enomoto Tue, 26 Apr 2022 04:59:33 -0700

Contact emails

[email protected], [email protected], [email protected]

Explainer

https://docs.google.com/document/d/1pvaMg7J5beBXD7trzHJH_MDULc_wRHLx40MFYAmjknE/edit

Specification

N/A (because there are no web-exposed changes)

Summary

This limited experiment measures how much "pervasive payloads" contribute
to the performance impact of the split HTTP cache in each Chrome channel
over a three-week period. Pervasive payloads are those third-party payloads
included on at least 500 sites and fetched at least 10M times in a month,
based on Chrome's analysis (payload list included below). This experiment
further measures the impact on Core Web Vitals metrics of restoring
pervasive payloads (and only pervasive payloads) to a single-keyed cache
regime. The privacy benefits of the split HTTP cache are preserved.

Blink component

Blink>Network
<https://bugs.chromium.org/p/chromium/issues/list?q=component:Blink%3ENetwork>

Motivation

Browsers split HTTP caches based on the top-frame visited origin
(“double-keyed” or "triple-keyed" caching) to prevent sites from tracking
users via a timing attack on a cross-site client cache.

Chrome’s analysis estimates that split caching results in a 3% increase in
cache misses, i.e. fetches for which a payload exists in the cache of the
user's device, but is unavailable to the page because it was fetched by the
user while loading a page from a different origin. This results in
approximately 4% more total bytes being fetched over the network.

Our analysis further revealed that many of the redundant fetches caused by
split caching were for common payloads associated with displaying user
content (libraries, fonts, widgets, ads) or common payloads that assist in
operating online businesses (analytics). The delayed arrival of these
common payloads resulted in visible "jank" for users, impacting performance
metrics like LCP <https://web.dev/lcp>, FCP <https://web.dev/fcp> and CLS
<https://web.dev/cls/>. This jank has been associated with negative effects
to online business' engagement and conversion rates. Furthermore, delayed
loads of analytics and ads payloads can result in missed ads impressions
and dropped analytics hits.

Initial public proposal

This experiment sends a list to Chrome of 100 <URL, hash> pairs whose
payloads are considered pervasive (the "pervasive payloads list"). During
the three-week experiment period, if Chrome fetches a payload that matches
both the URL and its hash on the pervasive payloads list, it is inserted
into a local single-keyed cache. This payload is then available for use by
Chrome when loading pages on other sites that include the matching URL. All
other fetches for URLs not on the pervasive payloads list are cached
according to the existing split HTTP cache.

The hash covers the payload body and most response headers, except for
those which change on every response.

To ensure we do not degrade the privacy profile of any users during this
experiment, only users with third-party cookies currently enabled will be
eligible for the experiment. We will compare the experience of users in
experiment and control arms according to total bytes loaded and page
performance metrics like the Core Web Vitals <https://web.dev/vitals>.

The pervasive payloads list was produced by crawling the web and
aggregating the most commonly referenced third-party resource URLs included
in HTML content. We then used pseudonymous URL-keyed metrics from Chrome to
estimate the traffic to sites and the number of impressions of third-party
resources. Individually identifiable browsing or search histories were not
used in the creation of the pervasive payload list (for more information
about Chrome's data collection policies and privacy policies, see
google.com/chrome/privacy). The resulting list was further filtered for any
URLs that might contain PII (e.g. URLs with extensive or opaque query
parameters). The list was also manually reviewed to ensure it included only
payloads reasonably expected to be pervasive; the manual review did not
result in any payloads being removed.

The privacy properties of the split HTTP cache are considered essential to
users and this proposal intends to maintain those properties, specifically
by maintaining split HTTP caching for all payloads not on the pervasive
payloads list.

API semantics are unchanged. User-facing functionality is unchanged (though
we expect performance to be slightly improved).

The list of the top 100 Pervasive URLs for use in this experiment is
pending internal reviews and will be shared on this thread upon approval.
Future directions

This experiment is the first step in a path exploring improved handling of
pervasive payloads in the browser cache. We outline the intended future
functionality here to clarify the intentions behind the current experiment.
The overview below is not complete or final and subsequent parts of the
design and implementation will be presented and discussed in further
Intents to Experiment and Prototype.

At a high level, a future improvement to the handling of pervasive payloads
may involve:

Assembling a list of pervasive payloads that meets the following
criteria:
-

Maintains the privacy of user browsing histories in its creation
-

Fairly represents pervasive payloads as they have been chosen by
developers on the web, not payloads selected or favored by any particular
library or browser vendor.
-

This experiment will initially use a static list of predefined
URLs assembled as described in the 'Initial public proposal'
section above
-

A future implementation will likely dynamically update the
payloads list on, for example, a weekly cadence.
-

Implementing shared caching for pervasive payloads that meets the
following criteria:
-

Materially improves load times and responsiveness for web users
(under study in this experiment)
-

Does not create a new tracking vector based on cache timing attacks
-

Does not require users to fetch payloads before the browser knows
they will need it (i.e. we don't plan to bundle payloads with browser
installs or updates)
-

Does not increase local storage required by browser installs or caches

To privately and fairly assemble the list of pervasive payloads, we are
exploring the use of Private Heavy Hitters
<https://www.tensorflow.org/federated/tutorials/private_heavy_hitters>. To
implement a privacy-preserving shared cache after the deprecation of
third-party cookies, we are exploring adding a measure of randomness to the
observed presence or absence of a pervasive payload in the shared cache.

However, this work is only worthwhile if it results in materially improved
load times for real users. This Intent to Experiment covers only whether or
not we should attempt to measure the performance gains that might be
realized if pervasive payloads were placed in a shared cache, as one data
point among others that will contribute to discussions about future steps
for the proposal.

TAG review

None yet.

TAG review status

N/A

Risks
Interoperability and Compatibility

Chrome's compliance with the relevant standards is unchanged. Caching
behavior differs between browsers so interoperability will not be affected.

The list of popular payloads is specifically chosen to minimize
compatibility risks.

Gecko: No signal

WebKit: No signal

Web developers: No signals

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that
it has potentially high risk for Android WebView-based applications? No

Debuggability

There is no developer-exposed API for this feature, so most DevTools
support is not relevant. It would be useful to indicate whether a resource
was served from the single-keyed cache in the network tab, however this
will not be implemented in the initial experiment.

Security and privacy

Single-keyed caching introduces global state shared between different
browsing contexts. A shared cache can introduce information leaks based on
cache probing (https://xsleaks.dev/docs/attacks/cache-probing/), including
XS-Search (https://xsleaks.dev/docs/attacks/xs-search/) in applications
which conditionally load a single-keyed-cache eligible resource based on
authenticated user state. The state of the cache, queried across different
contexts, could also be used as a fingerprint, permitting user tracking;
however, in this case, we believe this does not provide tracking
capabilities beyond those of third-party cookies.

To protect users during this experiment, we limit the experiment population
to those users with third-party cookies enabled. Recognizing that third-party
cookies will eventually be switched off for most users
<https://privacysandbox.com/>, we are developing protections such as
slightly randomizing cache hit/miss checks, disallowing eviction, or
guaranteeing attempts to read from the cache reliably populate that cache
entry. These protections will be proposed and incorporated before any
future optimizations are launched.

For the purposes of the current experiment, we will be using the same
implementation of single-keyed caching that Chrome used before the HTTP
cache was partitioned in M77 (
https://chromestatus.com/feature/5730772021411840).

To summarize, the security and privacy restrictions on this experiment are
as follows:

We will exclude users that have third-party cookies disabled.
2.

Only a small percentage of users will be included in the experiment,
reducing the likelihood and impact of any attacks abusing the single-keyed
cache.
3.

We will strictly limit the duration of the experiment on each channel to
3 weeks.
4.

We will only serve pervasive resources from the single-keyed cache.
5.

We can turn off the experiment immediately (independent of browser
updates) if any other threats appear.

Is this feature fully tested by web-platform-tests
<https://chromium.googlesource.com/chromium/src/+/master/docs/testing/web_platform_tests.md>
?

This behavior is specific to Chrome and not part of any standard, so it
will not be tested in web platform tests.

Flag name

CacheTransparency

Requires code in //chrome?

No, but the list of popular payloads and the mechanism for distributing it
to the browser will be Chrome-specific.

Tracking bug

https://bugs.chromium.org/p/chromium/issues/detail?id=1309002

Launch bug

https://bugs.chromium.org/p/chromium/issues/detail?id=1309353

Estimated milestones

M103 for off-by-default experiment

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5768521127559168

--
You received this message because you are subscribed to the Google Groups
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAA5e6990s-e4aYUnYK5%2BqzQpAyFzJa42y%2B%3D_MAnL19z%3DqemnWg%40mail.gmail.com.

[blink-dev] Intent to Experiment: Load common payloads from privacy-preserving single-keyed cache

Reply via email to