On Tue, Aug 29, 2017 at 5:13 PM, Kurt Roeckx via governance <
governance@lists.mozilla.org> wrote:

> On 2017-08-29 15:50, Georg Fritzsche wrote:
>
>> On Thu, Aug 24, 2017 at 10:23 AM, Kurt Roeckx via governance <
>> governance@lists.mozilla.org> wrote:
>>
>> On 2017-08-23 16:33, Alex Gaynor wrote:
>>>
>>> I had the same question, but it looks like RAPPOR has gotten
>>>> significantly
>>>> more advanced since I originally learned about the "just boolean
>>>> questions"
>>>> version. https://arxiv.org/pdf/1503.01214.pdf explains how to build
>>>> privacy
>>>> preserving measurements without knowing the values of the population.
>>>>
>>>>
>>> So if I understand things correctly from the paper, you create a bloom
>>> filter for the URL/hostname you want to send, then randomly change it,
>>> store that. And each time they ask about the URL/hostname you take the
>>> stored version, randomly change it and that's what you send.
>>>
>>> What I understand from that is that you don't get to learn the
>>> URL/hostname at all, but can query if a URL/hostname has been submitted.
>>> You don't get to learn what the population is, but the whole population
>>> can
>>> be send.
>>>
>>> Is that accurate?
>>>
>>>
>> Hi,
>>
>> through RAPPOR, we can send randomized values for all encountered domain
>> values.
>>
>> Then, in analysis, we can test the noisy aggregate data against known
>> domain values and get an estimate of how frequently they occurred.
>>
>> This gives immediate insights and we can increase the detail by adding
>> more
>> sources for known domain values.
>>
>
> The paper has several algorithms in it. The first is described in "II.
> BACKGROUND", which does not allow you to learn the dictionary, but you can
> check that certain URLs are in it or not.
>
> Then in "III. ESTIMATING JOINT DISTRIBUTIONS" they describe how you can
> correlate different answers with each other.
>
> Then in "IV. RAPPOR WITHOUT A KNOWN DICTIONARY" they describe that you can
> send some additional data, and then using the algorithm from III to learn
> something about the dictionary.
>
> Do you intend to use the algorithm from II or from IV?
>
> From what I understand, for the algorithm of II there are various
> parameters that affect the noise, and how likely it is someone can learn
> something about the data you're sending. I think they at least include:
> - The size of the bloom filter
> - The number of hashes you use
> - probability of the randomization for the PRR (f in the paper)
> - probability of the randomization for the IRR (q and p from the paper)
>
> Do you have any idea which you plan to use, and what the effect of that is?
>

The referenced paper is a newer one ("Building RAPPOR with the unknown
[...]").

Our current work is based on the first paper
<https://arxiv.org/pdf/1407.6981.pdf>. The anonymization part is described
in paragraphs 3.1 and 3.2. The aggregation/decoding is described in 4.

We will publish a summary of the technical details if we decide to move
forward with this.

We are also working on a blog post that will share the best practices and
approaches that we found from working on this.

Georg
_______________________________________________
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance

Reply via email to