Hi all,

We wanted to let you know about new data collection that we will be doing
for Firefox Hello starting with FF46 launch on April 19th, and the steps we
took to prevent it from collecting personal identification. We want to
collect more data about the websites that people share with Hello, to help
optimize the product UX, understand what people use our new tab sharing
feature for, and prioritize features accordingly. The product features and
UX can be very different if we decide to optimize against “Shopping
together” use cases as opposed to “Playing online games together”, just as
examples.


We did a lot of diligence for this and explored several options for getting
the data. The approach described below is the one we settled on. It
prevents personal identification and gets us the data we need to build the
best tool we can while being sensitive to our users. This involves
collecting the domain names for tabs shared on Firefox Hello on our own
servers.


How we collect the data


We plan to put in place a data collection solution that prevents personal
identification. The technical approach to doing this through the use of
client-side whitelisting is outlined here:



   -

   Data will go to our servers and will be stored with our other server
   metrics.  We are aggregating domain names, and are not storing session
   histories. These are submitted at the end of the session, so exact
   timestamps of any visit are not included.
   -

   Users who have disabled Health Reports will also not submit this data.
   -

   We would use a whitelist client-side to only collect domains that are
   part of the top 2000 domains (Alexa list of top domains). This prevents
   personal identification based on obscure domain usage. We would subtract
   the sites from the Adult
   <http://www.alexa.com/topsites/category/Top/Adult> category and add all
   the subdomains of:


   -

      google.com
      <http://www.labnol.org/internet/popular-google-subdomains/5888/>(e.g.,
      drive.google.com)
      -

      yahoo.com (e.g., games.yahoo.com)
      -

      developer.mozilla.org, bugzilla.mozilla.org, wiki.mozilla.org (this
      helps us understand how much our user base is Mozillians)
      -

      tunes.apple.com
      -

   You can see the exact list here: DomainWhitelist.jsm
   
<https://github.com/mozilla/loop/blob/master/add-on/chrome/modules/DomainWhitelist.jsm>



   -

   The data will only be kept for 6 months and we plan to revisit this
   collection in 6 months. We’ll evaluate at the end of this period if we
   should carry on collecting the data (the data is still useful and will help
   further shape the product) or just stop.


This e-mail is intended to make everyone aware of the data we’re collecting
in Hello in an effort to be as transparent as possible. We want make sure
people get the full picture of what we are trying to achieve and what we’re
putting in place to protect our users.


Let me know if you have any questions.



Implementation bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1211542

Technical documentation:
https://github.com/mozilla/loop/blob/master/docs/DataCollection.md


-Romain
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to