Mostly I'm just agreeing with Ryan below...

On 5/11/15 9:32 PM, Ryan Kelly wrote:
On 12/05/2015 13:14, Andrew Chilton wrote:
On 5 May 2015 at 17:33, Ryan Kelly <[email protected]> wrote:
One of the tricky-but-important questions we need to answer is:

* how many users accessed more than one FxA service this month?

Like *any* two, or a specific two e.g. Sync and Hello? (Or three I suppose.)

I was thinking "accessed any two services".  Bu correlating between
specific services also sounds useful.

FWIW, the primary use case that the executives care about is "any two services". One can image that someone may eventually ask about a specific two.

Can we do it in a more privacy-conscious manner?

Perhaps we could
post-process these in the data pipeline into something else, or we can
log something locally which we could use to correlate that same user to
another service (but not back to the user him/herself). The idea of a
Metrics ID has been raised which is a one-way mapping from uid to
Metrics ID (am leaving out any implementation details for now).
[...snip...]

 From what we are looking at above (i.e. "How many ...?") questions,
then is it safe to assume we won't be asked for answers to questions
such as "Who has ...?". i.e. are we always going to respond with an
aggregated number such as 6,000,001 rather than a list of users? If we
do the Metrics ID then we can't answer the "Who has ...?" questions
anyway, so are we sure we won't need to provide these kinds of
answers? And if we are asked to provide such answers, should we even
allow that (based on protecting the users privacy)?

I don't think we need to answer "who did X?" questions on a post-hoc
basis, and in fact we should actively try to be unable to answer such
questions.

We may want to know "user XYZ just did X" on a real-time basis under
very controlled circumstances, e.g. to trigger product
marketing/engagement emails for users who have opted into them.  But I
hope we can deal with such events as they occur and then forget about
them, rather than building up a big database of individual user activity
over time.


Agree with Ryan. The account activity events are meant to answer "How Many" questions.

Of course, all services would need to know how to make that MetricsID if it
was logged at the edge, but if the uid was post-processed in the data
pipeline this could be done centrally.

Yep.  If every service is able to do the uid -> metrics-id mapping at
will, then does it really gain us anything?

Not really. I'm definitely a +1 on doing the metrics-id in post
processing so that each edge can just log uid as-is. I believe Heka
currently scrubs UIDs and emails from the fxa-auth-server logs so
converting to a metrics id and scrubbing the original uid seems
possible.

I'm starting to like this approach as well, as it seems to simplify
things while still taking the anonymization issue seriously.

It would also make the aforementioned engagement integration easier.

+1

As I understand it, the primary goal of the metrics-id is that the group of people who are working with the metrics data set don't have access to the uid or email, which limits risk.

Could we go ahead and use metrics-id on all log data where we currently scrub account uids? That would unlock some potentially valuable longitudinal analysis & debugging. Alternatively, a session-id would be almost as useful.

I'd love for people to weigh in with their gut reactions here, even if
you don't have any comments on the technical details.

We will of course have to be in compliance with Mozilla's terms, privacy
policy, etc when collecting all these metrics.  But IMHO saying "we're
compliant with the posted ToS!" is not much help if what we're doing
just feels wrong to people.

I think you're right about the 'if it just feels wrong' however, how
do we actually go about measuring it against the manifesto (et al)? Is
it just our gut feel which tells us if we're doing fine against it?

The combination of "in compliance with our posted legal policies" and
"feels about right" is IMHO a very good start.  We can run it by our
legal/policy team once we've got a concrete proposal.

Seems like "metrics-id" would fall in a similar category to Hello's roomid: https://bugzilla.mozilla.org/show_bug.cgi?id=1140184, and the way we treat log data in general. Even though we take some steps to sanitize it, we do not treat log data as perfectly safe, we still have controls around who can access it and we monitor access.

Cheers,
Katie

_______________________________________________
Dev-fxacct mailing list
[email protected]
https://mail.mozilla.org/listinfo/dev-fxacct

Reply via email to