Thanks for the feedback, we are evaluating it and will follow up in the next weeks.
I'll summarize and expand on some key points that came up. Some background. As originally outlined, there are some important questions that we currently can not answer. Our existing data sources can either not give us the data needed or are not sufficiently representative of our user base. There are different constraints that we are working with: We have a strong stance on our privacy principles <https://www.mozilla.org/en-US/privacy/principles/> and about how we impact our users <https://blog.mozilla.org/futurereleases/2017/09/06/data-just-living/>. We also need reliable and representative data to make decisions, within the limits of these principles. Some kind of data we will not collect by default based on our principles, instead requiring additional consent. This means that we can not generally get representative populations, due to selection bias. Filling the gap. This is why we are exploring techniques to address this. What if we could collect data in a way that ensured a strict level of privacy inside Firefox, before anything was sent to a server? If we could achieve this, then we get both anonymous and representative data collection. This would be subject to our privacy policy <https://www.mozilla.org/en-US/privacy/>. If users turn off data sharing through the preferences, we would not submit this data - Firefox will always respect user choice. How this works. This is where Differential Privacy <https://en.wikipedia.org/wiki/Differential_privacy> techniques come in, which rely on practices of hashing and noise injection so that no conclusions can be made about individual users. The most common example comes from Social Science studies, where participants are lying about their answers as determined by coin flip. RAPPOR <https://arxiv.org/pdf/1407.6981.pdf> is one specific technique that can be applied to strings and allows giving formal privacy guarantees, depending on the choice of parameters. What we plan to do. The current plan is to run a SHIELD study, to confirm that we can get answers for this kind of data from the Firefox population. Using RAPPOR we collect aggregate data on the most common domain value users set their homepage to (e.g. foo.com) or the value of "about:home". Through the use of RAPPOR, only obfuscated data will leave Firefox. By sending out noisy data, we protect the privacy of individual users. Then, for getting answers out of the noisy data we receive, we can test the aggregated data of all users against a list of domains from other sources. We can use e.g. the Alexa Top 500 sites to estimate if any of them are present in the aggregated data. What's next. Currently we are evaluating the feedback and will decide about next steps. We will continue to work in the open and maintain a dialog around our data collection practices. Stay tuned for further communications regarding our research into differential privacy, best practices, scientific collaborations and more technical details. Georg On Mon, Aug 21, 2017 at 5:56 PM, Georg Fritzsche <gfritzs...@mozilla.com> wrote: > Hi, > > for Firefox we want to better understand how people use our product to > improve their experience. To do that, we are planning to run a new SHIELD > study that tests how we can collect additional data in a privacy preserving > way. Check out the details below and send me your thoughts. > > The problem. > > One recurring ask from the Firefox product teams is the ability to collect > more sensitive data, like top sites users visit and how features perform on > specific sites. > > Currently we can collect this data when the user opts in, but we don't > have a way to collect unbiased data, without explicit consent (opt-out). > > Asks for sensitive data center most commonly around knowing something in > relation to which sites a user visits: > > - > > "Which top sites are users visiting?" > - > > "Which sites using Flash does a user encounter?" > - > > "Which sites does a user see heavy Jank on?" > > In summary most asks are for occurrences of an event X per domain (more > specifically eTLD+1 [1], e.g. facebook.com or google.co.uk). > > The solution. > > One solution is the use of differential privacy [2] [3], which allows us > to collect sensitive data without being able to make conclusions about > individual users, thus preserving their privacy. > > An attacker that has access to the data a single user submits is not able > to tell whether a specific site was visited by that user or not. > > The Google Open Source project called RAPPOR [4] [5] is the most widely > known and deployed implementation of differential privacy. > > We have been investigating the use of RAPPOR for these kind of use-cases, > with initial simulation results being promising. > > Our plan. > > What we plan to do now is run an opt-out SHIELD study [6] to validate our > implementation of RAPPOR. This study will collect the value for users’ home > page (eTLD+1) for a randomly selected group of our release population We > are hoping to launch this in mid-September. > > This is not the type of data we have collected as opt-out in the past and > is a new approach for Mozilla. As such, we are still experimenting with the > project and wanted to reach out for feedback. > > Georg > > References: > > 1: https://en.wikipedia.org/wiki/Public_Suffix_List > > 2: https://en.wikipedia.org/wiki/Differential_privacy > > 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/ > > 4: https://github.com/google/rappor > 5: https://arxiv.org/abs/1407.6981 > <https://arxiv.org/abs/1407.6981>6: https://wiki.mozilla.org/ > Firefox/Shield/Shield_Studies > _______________________________________________ governance mailing list governance@lists.mozilla.org https://lists.mozilla.org/listinfo/governance