Re: Usage of Differential Privacy & RAPPOR

Kurt Roeckx via governance Tue, 29 Aug 2017 08:14:35 -0700

On 2017-08-29 15:50, Georg Fritzsche wrote:

On Thu, Aug 24, 2017 at 10:23 AM, Kurt Roeckx via governance <
governance@lists.mozilla.org> wrote:

On 2017-08-23 16:33, Alex Gaynor wrote:

I had the same question, but it looks like RAPPOR has gotten significantly
more advanced since I originally learned about the "just boolean
questions"
version. https://arxiv.org/pdf/1503.01214.pdf explains how to build
privacy
preserving measurements without knowing the values of the population.


So if I understand things correctly from the paper, you create a bloom
filter for the URL/hostname you want to send, then randomly change it,
store that. And each time they ask about the URL/hostname you take the
stored version, randomly change it and that's what you send.

What I understand from that is that you don't get to learn the
URL/hostname at all, but can query if a URL/hostname has been submitted.
You don't get to learn what the population is, but the whole population can
be send.

Is that accurate?


Hi,

through RAPPOR, we can send randomized values for all encountered domain
values.

Then, in analysis, we can test the noisy aggregate data against known
domain values and get an estimate of how frequently they occurred.

This gives immediate insights and we can increase the detail by adding more
sources for known domain values.

The paper has several algorithms in it. The first is described in "II.BACKGROUND", which does not allow you to learn the dictionary, but youcan check that certain URLs are in it or not.

Then in "III. ESTIMATING JOINT DISTRIBUTIONS" they describe how you cancorrelate different answers with each other.

Then in "IV. RAPPOR WITHOUT A KNOWN DICTIONARY" they describe that youcan send some additional data, and then using the algorithm from III tolearn something about the dictionary.


Do you intend to use the algorithm from II or from IV?

From what I understand, for the algorithm of II there are variousparameters that affect the noise, and how likely it is someone can learnsomething about the data you're sending. I think they at least include:

- The size of the bloom filter
- The number of hashes you use
- probability of the randomization for the PRR (f in the paper)
- probability of the randomization for the IRR (q and p from the paper)

Do you have any idea which you plan to use, and what the effect of that is?


Kurt
_______________________________________________
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance

Re: Usage of Differential Privacy & RAPPOR

Reply via email to