On 12/05/2015 13:14, Andrew Chilton wrote:
> On 5 May 2015 at 17:33, Ryan Kelly <[email protected]> wrote:
>> One of the tricky-but-important questions we need to answer is:
>>
>> * how many users accessed more than one FxA service this month?
>
> Like *any* two, or a specific two e.g. Sync and Hello? (Or three I suppose.)
I was thinking "accessed any two services". Bu correlating between
specific services also sounds useful.
>> Can we do it in a more privacy-conscious manner?
>>
>>> Perhaps we could
>>> post-process these in the data pipeline into something else, or we can
>>> log something locally which we could use to correlate that same user to
>>> another service (but not back to the user him/herself). The idea of a
>>> Metrics ID has been raised which is a one-way mapping from uid to
>>> Metrics ID (am leaving out any implementation details for now).
>[...snip...]
>
> From what we are looking at above (i.e. "How many ...?") questions,
> then is it safe to assume we won't be asked for answers to questions
> such as "Who has ...?". i.e. are we always going to respond with an
> aggregated number such as 6,000,001 rather than a list of users? If we
> do the Metrics ID then we can't answer the "Who has ...?" questions
> anyway, so are we sure we won't need to provide these kinds of
> answers? And if we are asked to provide such answers, should we even
> allow that (based on protecting the users privacy)?
I don't think we need to answer "who did X?" questions on a post-hoc
basis, and in fact we should actively try to be unable to answer such
questions.
We may want to know "user XYZ just did X" on a real-time basis under
very controlled circumstances, e.g. to trigger product
marketing/engagement emails for users who have opted into them. But I
hope we can deal with such events as they occur and then forget about
them, rather than building up a big database of individual user activity
over time.
>>> Of course, all services would need to know how to make that MetricsID if it
>>> was logged at the edge, but if the uid was post-processed in the data
>>> pipeline this could be done centrally.
>>
>> Yep. If every service is able to do the uid -> metrics-id mapping at
>> will, then does it really gain us anything?
>
> Not really. I'm definitely a +1 on doing the metrics-id in post
> processing so that each edge can just log uid as-is. I believe Heka
> currently scrubs UIDs and emails from the fxa-auth-server logs so
> converting to a metrics id and scrubbing the original uid seems
> possible.
I'm starting to like this approach as well, as it seems to simplify
things while still taking the anonymization issue seriously.
It would also make the aforementioned engagement integration easier.
>> I'd love for people to weigh in with their gut reactions here, even if
>> you don't have any comments on the technical details.
>>
>> We will of course have to be in compliance with Mozilla's terms, privacy
>> policy, etc when collecting all these metrics. But IMHO saying "we're
>> compliant with the posted ToS!" is not much help if what we're doing
>> just feels wrong to people.
>
> I think you're right about the 'if it just feels wrong' however, how
> do we actually go about measuring it against the manifesto (et al)? Is
> it just our gut feel which tells us if we're doing fine against it?
The combination of "in compliance with our posted legal policies" and
"feels about right" is IMHO a very good start. We can run it by our
legal/policy team once we've got a concrete proposal.
Cheers,
Ryan
_______________________________________________
Dev-fxacct mailing list
[email protected]
https://mail.mozilla.org/listinfo/dev-fxacct