Re: [DNSOP] Fwd: New Version Notification for draft-dickinson-dnsop-dns-capture-format-00.txt

Jim Hague Tue, 01 Nov 2016 12:32:33 -0700

On 01/11/2016 17:54, Philip Homburg wrote:

If find it hard to believe that after compression, the BSON encoded
version of the DNS data would be a lot smaller than just the
raw DNS data.


There is a not a lot of redundancy in the DNS encoding.

Certainly there is not a lot of redundancy in the DNS encoding of asingle packet,and there is a fair amount of poorly compressible data in transportheaders in

the PCAP.

What we're exploiting, though, is the redundancy in DNS encoding in astream ofpackets. We're building tables of data that is often repeated in astream - names,addresses, etc. - and storing references instead of repeating data. Wecan do thischeaply during writing of the CBOR output, because we know where theredundancywill be located. A general purpose compression engine will end up doingmuch thesame, but will have to work harder to locate that specific redundancy.Also, bywriting all e.g. names in a table, we're both grouping data that islikely to havesignificant internal redundancy that we're not exploiting and making thesize of the input

data to the compression much smaller, both of which again makes the general
engine's job much easier.

So I don't think it follows from badly compressing pcaps that storing
raw DNS would compress badly as well. Unless I missed some tricks
why the CBOR version compresses a lot better.

We did experiment with simple CBOR and Avro encodings of individual DNSpackets withminimal transport information, which I think would be comparable. Ourdata showed thatthe size of input to the compressor was ~10x the size of our format, andthe final sizeafter compression was still significantly greater (~25-30%) than ourformat after compression.We did not take compression resource measurements in that case, butgiven our experienceI would be surprised if the compression resources required were not alsosignificantly

greater.

The downside of CBOR, certainly as used here is that uses integers to
identify fields in what JSON calls objects.

So anybody who writes a local extension is likely to just continue numbering
fields, which leeds to mutually incompatible extensions.

In contrast, formats like XML, JSON, but also BSON where fields have names
make it less likely that people will pick the same identifier for
completely different purposes.

CBOR does not have to use integers as key values. It can use strings inexactlythe same way as BSON. The reason for using integers is simply one ofspace andhence file size and minimising load on the final compressor. Key valuesoccur in thedata stream in both CBOR and BSON for every item with that key, so usingstrings as key

values is not consistent with a goal of minimum file size.

We expect it would be possible, given the CDDL specification of theformat, to use that

specification to turn keys values back into text for, say, a conversion to
JSON, but no such tool currently exists, as far as we are aware.
--
Jim Hague - j...@sinodun.com          Never trust a computer you can't lift.

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Fwd: New Version Notification for draft-dickinson-dnsop-dns-capture-format-00.txt

Reply via email to