I think the premise that you need to collect data on the top sites that a user visits may be flawed. Won't you be contributing to the dominance of (already-dominant) top sites by optimizing for them specifically?
It also seems that you could get a reasonably accurate idea of what sites are most popular among FIrefox users by looking at the most popular sites overall and optimizing for those. Do you expect that Firefox users are so wildly different that their top sites don't look more or less the same as the top sites overall? Further, as has been shown again and again, data thought to be untraceable to any particular user has been deanonymized through correlations with other data sets. Something like top visited sites are actually a pretty juicy target as well for state actors, blackmailers, etc. Finally, the mere act of doing random (from the user's perspective) telemetry is problematic. First, users on limited connections don't need to be using more data than they already are. Second, the mere act of making a request with IP endpoints, even if it sends only a ping, can expose an unprepared user who needs privacy. I understand that Firefox already does some of this, but that's not really a reason to do more. From a business perspective, a major differentiating factor (arguably the only differentiating factor) of Firefox is that Mozilla isn't Google. The closer you get to that line, the more damage you'll do to the trust users have in Mozilla. I recommend that you take the high road on this one. I'm not sure what the motivator is here (does having more data give you leverage with partners)? But the stated justification (improving speeds on particular websites) seems too weak to excuse the valid privacy concerns. Mozilla: we want to trust you. We do trust you. We know it's tough out there. You're playing with the big kids, and they have intel that, admittedly, probably helps them improve their products. But the way you can improve your product is by NOT collecting that intel. Do the Mozilla thing, not the Google thing. On Monday, August 21, 2017 at 11:56:44 AM UTC-4, Georg Fritzsche wrote: > Hi, > > for Firefox we want to better understand how people use our product to > improve their experience. To do that, we are planning to run a new SHIELD > study that tests how we can collect additional data in a privacy preserving > way. Check out the details below and send me your thoughts. > > The problem. > > One recurring ask from the Firefox product teams is the ability to collect > more sensitive data, like top sites users visit and how features perform on > specific sites. > > Currently we can collect this data when the user opts in, but we don't > have a way to collect unbiased data, without explicit consent (opt-out). > > Asks for sensitive data center most commonly around knowing something in > relation to which sites a user visits: > > - > > "Which top sites are users visiting?" > - > > "Which sites using Flash does a user encounter?" > - > > "Which sites does a user see heavy Jank on?" > > In summary most asks are for occurrences of an event X per domain (more > specifically eTLD+1 [1], e.g. facebook.com or google.co.uk). > > The solution. > > One solution is the use of differential privacy [2] [3], which allows us to > collect sensitive data without being able to make conclusions about > individual users, thus preserving their privacy. > > An attacker that has access to the data a single user submits is not able > to tell whether a specific site was visited by that user or not. > > The Google Open Source project called RAPPOR [4] [5] is the most widely > known and deployed implementation of differential privacy. > > We have been investigating the use of RAPPOR for these kind of use-cases, > with initial simulation results being promising. > > Our plan. > > What we plan to do now is run an opt-out SHIELD study [6] to validate our > implementation of RAPPOR. This study will collect the value for users’ home > page (eTLD+1) for a randomly selected group of our release population We > are hoping to launch this in mid-September. > > This is not the type of data we have collected as opt-out in the past and > is a new approach for Mozilla. As such, we are still experimenting with the > project and wanted to reach out for feedback. > > Georg > > References: > > 1: https://en.wikipedia.org/wiki/Public_Suffix_List > > 2: https://en.wikipedia.org/wiki/Differential_privacy > > 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/ > > 4: https://github.com/google/rappor > 5: https://arxiv.org/abs/1407.6981 > <https://arxiv.org/abs/1407.6981>6: > https://wiki.mozilla.org/Firefox/Shield/Shield_Studies _______________________________________________ governance mailing list governance@lists.mozilla.org https://lists.mozilla.org/listinfo/governance