> On 24 Nov 2018, at 03:35, Benjamin Kaduk <ka...@mit.edu> wrote: > > On Wed, Nov 21, 2018 at 01:53:09PM +0000, Sara Dickinson wrote: >> >> >>> Begin forwarded message: >>> >>> From: Benjamin Kaduk <ka...@mit.edu <mailto:ka...@mit.edu>> >>> Subject: Benjamin Kaduk's Discuss on >>> draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT) >>> Date: 19 November 2018 at 00:28:19 GMT >>> To: "The IESG" <i...@ietf.org <mailto:i...@ietf.org>> >>> Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org >>> <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski >>> <tjw.i...@gmail.com <mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org >>> <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com >>> <mailto:tjw.i...@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org> >>> Resent-From: <alias-boun...@ietf.org <mailto:alias-boun...@ietf.org>> >>> Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com >>> <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, >>> terry.mander...@icann.org <mailto:terry.mander...@icann.org>, >>> john.b...@icann.org <mailto:john.b...@icann.org> >> >> Many thanks for the detailed review. >> >>> >>> ---------------------------------------------------------------------- >>> DISCUSS: >>> ---------------------------------------------------------------------- >>> >>> It is pretty shocking to not see any discussion of the privacy >>> considerations of storing data including client addresses (and ports) >>> alongside DNS transactions, given how central DNS resolution is to user >>> behavior on the web. (Note that there are mentions of potentially >>> anonymized data in Sections 6.2 and 6.2.3 which would presumably >>> forward-reference the privacy considerations.) Data normalization would >>> probably also be mentioned in this section, since (e.g.) the case used for >>> a query/response could be used in fingerprinting an implementation. >> >> There have been extensive discussion of data storage risks and practices in >> two DPRIVE documents so I’d suggest the following changes in the first >> instance to address this: > > This is exactly the sort of thing I was hoping to see, thank you! I have > just a couple tweaks to suggest, inline. > >> New Privacy Considerations section: >> “ Storage of DNS traffic by operators in PCAP and other formats is a long >> standing and widespread practice. Section 2.5 of >> draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet >> users of the storage of DNS traffic data in servers (recursive resolvers, >> authoritative and rogue server). >> >> Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those >> risks for data stored on recursive resolvers (but which could by extension >> apply to authoritative servers). These include data handling practices and >> methods for data minimisation, IP address pseudonymization and >> anonymization. Appendix B of that document presents an analysis of 7 >> published anonymization processes. In addition RSSAC have recently published >> RSSAC04: " Recommendations on Anonymization Processes for Source IP >> Addresses Submitted for Future Analysis”[1]. >> >> The above analyses consider full data capture (e.g using PCAP) as a >> baseline for privacy considerations and therefore this format >> specification introduces no new user privacy issues beyond those of full >> data capture. It does provides mechanisms to selectively record only > > I would say "beyond those of full data capture (which are quite severe)". > That is, while the current state of affairs is a valid baseline for > comparison, that does not absolve us of responsibility for analyzing the > current state of affairs. (To be clear, > draft-bortzmeyer-dprive-rfc7626-bis is a fine place for the bulk of that > anlaysis to live, but in this document we should not pretend that the > current state of affairs is a good situation to be in.) > >> certain fields at the time of data capture to improve user privacy and to >> explicitly indicate that data is sampled and or anonymised. It also >> provide flags to indicate if data normalisation has been performed; data >> normalisation increases user privacy by reducing the potential for >> fingerprinting individuals however a trade-off is potentially reducing > > I think "however" would be offset by commas on both sides.
Both these WFM - thanks. And thanks for the responses below - will update the draft accordingly. Sara. > >> the capacity to identify attack traffic via query name signatures. >> Operators should carefully consider their operational requirements and >> privacy policies and SHOULD capture at source the minimum user data >> required to meet their needs“ >> >> [1] https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf >> <https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf> >> >> >> As noted, there are a few other places we can also highlight the privacy >> aspects: >> >> Introduction: >> OLD: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in >> practice for packet captures, but these file formats can contain a great >> deal of additional information that is not directly pertinent to DNS >> traffic analysis and thus unnecessarily increases the capture file size.” >> >> NEW: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in >> practice for packet captures, but these file formats can contain a great >> deal of additional information that is not directly pertinent to DNS >> traffic analysis and thus unnecessarily increases the capture file size. >> Additionally these tools and format typically have no filter mechanism to >> selectively record only certain fields at capture time, requiring >> post-processing for anonymisation or pseudonymistaion of data to protect >> user privacy. >> >> Section 4, bullet point 2: >> >> OLD: “Different users will have different requirements >> for data to be available for analysis. Users with minimal >> requirements should not have to pay the cost of recording full >> data, though this will limit the ability to perform certain >> kinds of data analysis and also to reconstruct packet >> captures. For example, omitting the resource records from a >> Response will reduce the C-DNS file size; in principle >> responses can be synthesized if there is enough context.” >> >> NEW: “Different operators will have different requirements >> for data to be available for analysis. Operators with minimal >> requirements should not have to pay the cost of recording full >> data, though this will limit the ability to perform certain >> kinds of data analysis and also to reconstruct packet >> captures. For example, omitting the resource records from a >> Response will reduce the C-DNS file size; in principle >> responses can be synthesized if there is enough context. >> Operators may have different policies for collecting user data >> and can choose to omit or anonymise certain fields at >> capture time e.g. client address." >> >> And yes, in both sections 6.2 and 6.2.3 add forward references to the >> Privacy Considerations section >> >> >>> >>> I'm also concerned about the policy/procedure for allocating/extending the >>> various bitfields and similar potential extension points in the data >>> structures. Section 8 covers the major/minor versioning semantics with >>> respect to new map keys and new maps, but not addition of new bits within >>> existing (uint) bitmaps. Given the usage of the CDDL .bits constraint, >>> it's not really clear that an IANA registry is the right tool to use, but I >>> think some indication of the expected way to allocate new bits is in order, >>> whether it's "a future standards-track document that updates this document" >>> or otherwise. (I've noted many, but not all, instances of such bitmaps in >>> my COMMENT section.) >> >> We are inclined to follow the lead of existing RFCs making use of CBOR, >> namely >> * RFC8152 'CBOR Object Signing and Encryption' (July 2017) >> * RFC8392 ‘CBOR Web Token (CWT)' (May 2018) and >> * RFC8428 'Sensor Measurement Lists (SenML)' (Aug 2018) >> and request IANA create a C-DNS registry with >> subregistries with keys for each of the different maps used in C-DNS. >> New entries in these subregistries would follow Expert Review as defined >> in RFC8126. This appears to be the emerging usual way of dealing with >> CBOR map key values, particularly integer. > > That sounds like a fine path forward, thanks. > >>> >>> There are also a couple of fields whose semantics don't seem to be >>> sufficiently well specified for a proposed-standard document, such as >>> vlan-ids, generator-id, name-rdata, and ae-code. (I understand that some >>> of them are probably only going to have locally relevant semantics, but we >>> should be explicit about when that's the case.) >> >> Acknowledged, we’ll add references or clarifications for these (will put >> details in a follow up mail that will also address your comments below). > > Sounds good. > >>> >>> If I'm reading things correctly that the IP address type is inferred from >>> the bytestring length, then I think we need to enforce a restriction on the >>> address prefix length(s) to allow for that inference to be unambiguous >>> (noting that we only have the *byte* length of the address fields at our >>> disposal for disabmgituation, and not the more precise bit-length). >> >> Ah, the first bit of the qr-transport-flags contains a IPv4/IPv6 flag so the >> address type can be explicitly determined from that if it is set but of >> course there is a corner case where that field isn’t present we hadn’t >> considered so we’ll have to address that. Making that field mandatory if >> prefixes are used would be simplest. > > I guess I had forgotten about that bit in the qr-transport-flags on my > first read. Making it mandatory if prefix lengths are present ought to > work. > > -Benjamin _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop