Re: Usage of Differential Privacy & RAPPOR

Georg Fritzsche via governance Fri, 01 Sep 2017 07:40:39 -0700

On Sun, Aug 27, 2017 at 2:47 PM, David Bruant <bruan...@gmail.com> wrote:


> Asks for sensitive data center most commonly around knowing something in
>> relation to which sites a user visits:
>>
>>     -
>>
>>     "Which top sites are users visiting?"
>>     -
>>
>>     "Which sites using Flash does a user encounter?"
>>     -
>>
>>     "Which sites does a user see heavy Jank on?"
>>
>> In summary most asks are for occurrences of an event X per domain (more
>> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>>
>> The solution.
>>
>> One solution is the use of differential privacy [2] [3], which allows us
>> to
>> collect sensitive data without being able to make conclusions about
>> individual users, thus preserving their privacy.
>>
>> An attacker that has access to the data a single user submits is not able
>> to tell whether a specific site was visited by that user or not.
>>
> Just to be 100% sure i understand, what will happen is that Firefox will
> lie (or answer randomly) to the question with probability p. This way, even
> if an attacker reaches to Moz servers, they can trust the answer only with
> probability 1-p.
> There is a trade-off between utility (low p) and stronger privacy (high p).
> Could this trade-off be documented and a hard low limit be decided?
> Should each study decide on a different p based on data sensitivity?
>

Yes, once the value is encoded we will lie or answer randomly about the
status of each bit with a certain probability. This probability depends on
a prior state of  the bloom filter which holds potential responses. it was
a 1 or a 0 and on all the parameters of the RAPPOR algorithm.

As an end-result we effectively constrain it to 1-p.

As your intuition correctly suggests, there is a balance between utility
and privacy. Our goal is to choose parameters such that the privacy of
users is assured, while also getting statistical insights from the
aggregate data.

The privacy guarantee is expressed in terms of the ε parameter. For RAPPOR
this takes into account the addition of noise on the client-side via the
“lying” mechanism described above. Depending on the data sensitivity, the
population size and the collection frequency (one-time or repeated) the
ε-level should be fixed and the appropriate set of parameters need to be
tuned. Under some circumstances this may mean that useful data may not be
collected, in which case user privacy is still preserved.

This parameter choice should be transparently documented and we need to
establish hard limits as well as best practices around choosing them.

We are working on a blog post that will share the best practices and
approaches that we found.


Georg
_______________________________________________
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance

Re: Usage of Differential Privacy & RAPPOR

Reply via email to