The Evolution hook works on the BTs, looking for known variables that
hold (potentially) private data, and -- for any other variable --, we
scan for instances of IP or email addresses, and fully-qualified server
names. All matches are replaced by the string '##MASKED##'.

Of course, this will only be fully effective when bug 387933 is resolved
for the backoffice.

Meanwhile, the hook seems to be working correctly for the list of
Evolution bugs Brian provided me with (BTW, thank you!). The hook
currently:

1. Collects  Evolution GConf data ( Plugins, Junk Setup, and Prompts subkeys of 
/apps/evolution); these are added in a [Miscellaneous] string;
2. for each of {Stacktrace, ThreadStacktrace): scans the lines, and replaces 
any string value for following Evolution variables by the string "##MASKED##":
    
r'''(key|url_string|url|filename|filesave|uri|profname|user|source|username|password|server|domain|domain_name)
 # variables in trace
    ([\s]*[=].+?["])        # intermediate text (class, address, etc)
    (.*?)                   # what we really want: the string data
    (["][, ]*)'''           # the delimiter
3. then we search & replace still-existing instances of email addresses, 
fully-qualified server names, and IP addresses (in this order), in any other 
variables.
4. (Currently) writes a *diff* for the changes made (creates two *new* entries 
in reports[]. This was done because we were not sure of how invasive the 
changes would be, and considered better to just write a diff, at least for now. 
*Input needed*

For both FQSN and email addresses we use the following RE for domain names:
    
'(aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[a-z]{2})'
This RE wil match on any of the initial words, or on any two letters.

For IP addresses we use the following RE:
    '([^\d])(\d{1,3}[.]\d{1,3}[.]\d{1,3}[.](\d{1,3}))([^\d])'
This RE will match on *any* dotted sequence of one to three digits, enclosed in 
non-digits (for example, "[1.2.3.4]"). It will also match on invalid IP 
addresses (since no limits are set on the range; for example it will match on 
"a912.513.401.12/".

For email addresses we use the following RE:
    '[\w\.\...@[\w\.\-]+[.]' + DOMAIN_NAMES
This RE will match on words (plus '.' and '-', followed by an at symbol ('@') 
and a DOMAIN_NAME. This is clearly not fully correct (it would allow, for 
example, for an email starting with '.'), but it is enough.

For FQSN we use the following RE:
    '([^\w])([\w.-]*[.]' + DOMAIN_NAMES + ')([^\w\-]|[\n])'
This RE is very similar to the email RE; the differences are (1) it is 
pre/post-fixed with non-words, and has a dot instead of an at symbol.

5. Finally, we currently calculate a diff of the changes to Stacktrace
and ThreadStacktrace, and add it in the report as [Stacktrace.diff] and
[ThreadStacktrace.diff].

6. and exit.

Additional comments:

(a) although the idea is to provide a sanitised stacktrace in order to
allow for the bug to be classified Public, I was reluctant to delete the
original stacktraces: not only I may be missing something, but also
there *might* be a case where the sanitised value would be needed for a
full understanding of the issue. This is why we decided to *add* a diff
for the changes -- a sanitised stacktrace can then be easily achievable
by patching the corresponding stacktrace with its diff. Another option
would be to provide the sanitised stacktrace (removing the original) and
the, er, reverse diff, in order to get the original one.

(b) option (a) would be, in my view, the ideal scenario, but we would
depend on bug 151658 to make attachments and comments private.

I have run this hook against 753 bugs from Brian's list, and it *seems*
it is working correcly. The runs were executed by calling the hook with
the --report parameter; as currently coded, only the .diffs are printed
out.

TO BE DECIDED:

1. should we delete the original traces, and maintain only the saanitised 
traces (and, perhaps, a reverse diff)?
2. should we save the original traces, and the diffs?

Note that these two options will not allow for the bug to be marked
public.

3. should we save *only* the sanitised traces, and mark the bug public?

I will provide test data, based on the runs I have.

-- 
apport hook for Evolution
https://bugs.launchpad.net/bugs/391623
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to