Re: Usage of Differential Privacy & RAPPOR

Georg Fritzsche via governance Mon, 11 Sep 2017 06:09:08 -0700

Thanks for the feedback, we are evaluating it and will follow up in the
next weeks.

I'll summarize and expand on some key points that came up.

Some background.

As originally outlined, there are some important questions that we
currently can not answer. Our existing data sources can either not give us
the data needed or are not sufficiently representative of our user base.

There are different constraints that we are working with:

We have a strong stance on our privacy principles
<https://www.mozilla.org/en-US/privacy/principles/> and about how we impact
our users
<https://blog.mozilla.org/futurereleases/2017/09/06/data-just-living/>.

We also need reliable and representative data to make decisions, within the
limits of these principles.

Some kind of data we will not collect by default based on our principles,
instead requiring additional consent. This means that we can not generally
get representative populations, due to selection bias.

Filling the gap.

This is why we are exploring techniques to address this. What if we could
collect data in a way that ensured a strict level of privacy inside
Firefox, before anything was sent to a server?

If we could achieve this, then we get both anonymous and representative
data collection.

This would be subject to our privacy policy
<https://www.mozilla.org/en-US/privacy/>. If users turn off data sharing
through the preferences, we would not submit this data - Firefox will
always respect user choice.

How this works.

This is where Differential Privacy
<https://en.wikipedia.org/wiki/Differential_privacy> techniques come in,
which rely on practices of hashing and noise injection so that no
conclusions can be made about individual users. The most common example
comes from Social Science studies, where participants are lying about their
answers as determined by coin flip.

RAPPOR <https://arxiv.org/pdf/1407.6981.pdf> is one specific technique that
can be applied to strings and allows giving formal privacy guarantees,
depending on the choice of parameters.

What we plan to do.

The current plan is to run a SHIELD study, to confirm that we can get
answers for this kind of data from the Firefox population. Using RAPPOR we
collect aggregate data on the most common domain value users set their
homepage to (e.g. foo.com) or the value of "about:home".

Through the use of RAPPOR, only obfuscated data will leave Firefox. By
sending out noisy data, we protect the privacy of individual users.

Then, for getting answers out of the noisy data we receive, we can test the
aggregated data of all users against a list of domains from other sources.
We can use e.g. the Alexa Top 500 sites  to estimate if any of them are
present in the aggregated data.

What's next.

Currently we are evaluating the feedback and will decide about next steps.

We will continue to work in the open and maintain a dialog around our data
collection practices. Stay tuned for further communications regarding our
research into differential privacy, best practices, scientific
collaborations and more technical details.

Georg

On Mon, Aug 21, 2017 at 5:56 PM, Georg Fritzsche <gfritzs...@mozilla.com>
wrote:

> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
>
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
>    -
>
>    "Which top sites are users visiting?"
>    -
>
>    "Which sites using Flash does a user encounter?"
>    -
>
>    "Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us
> to collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
>
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population  We
> are hoping to launch this in mid-September.
>
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
>
> Georg
>
> References:
>
> 1: https://en.wikipedia.org/wiki/Public_Suffix_List
>
> 2: https://en.wikipedia.org/wiki/Differential_privacy
>
> 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/
>
> 4: https://github.com/google/rappor
> 5: https://arxiv.org/abs/1407.6981
> <https://arxiv.org/abs/1407.6981>6: https://wiki.mozilla.org/
> Firefox/Shield/Shield_Studies
>
_______________________________________________
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance

Re: Usage of Differential Privacy & RAPPOR

Reply via email to