On Fri, Jul 7 2023 at 09:21:15 PM -0400, Demi Marie Obenour <demioben...@gmail.com> wrote:
For metrics to not be personally identifiable, it is necessary that the set of metrics collected have sufficiently low entropy that on average, _many_ users will send _the exact same metrics_. It is very hard for me
to see any useful set of metrics having such low entropy.

If Fedora has 2 million users (possibly an overestimate) then the
metrics would need to have entropy much less than 2^21, which means
that the entire metrics set would need to be able to be represented
as a 20-bit integer.  In practice, I suspect one would need to fit
the entire set in a 16-bit integer or less, and possibly
_significantly_ less.

We're not going to build creepy user profiles. Particular metrics will be stored individually, not correlated together.

Let's say we have two metrics:

Key | Value
------------
User launched GNOME Builder today? | y/n
User has NVIDIA proprietary driver | y/n

We would know how many users launched Builder and how many users have NVIDIA graphics, but we wouldn't know how many NVIDIA users launched Builder because there's just no need to tie those two data points together.

Michael

_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to