On 02/18/2015 04:04 AM, janhein.vanderb...@gmail.com wrote:
On Tuesday, February 17, 2015 at 3:35:16 PM UTC+1, Chris Angelico wrote:
Oh, incidentally: If you want a decent binary format for
variable-sized integer, check out the MIDI spec.
I did some time ago, thanks, and it is indeed a decent format.
I also looked at variations of that approach.
None of them beats
Define "beats." You might mean beats in simplicity, or in elegance, or
in clarity of code. But you probably mean in space efficiency, or
"compression." But that's meaningless without a target distribution of
values that you expect to encode.
For example, if 99.9% of your values are going to be less than 255, then
the most efficient byte encoding would be one that simply stores a value
less than 255, and starts with an FF for larger values. It's almost
irrelevant how it encodes those larger values.
On the other hand, if most values are going to be in the 10,000 to
20,000 bit size range, and a few will be much smaller, and a few will be
very much larger, then it would be very practical to start with a size
field, say 16 bits, followed by the raw packed data. Naturally, the
size field would need to have an escape value that indicates a larger
field was needed. In fact, the size field could be encoded in a
7bits-per-byte manner, so it would encode an arbitrary sized number as well.
"my" concept of two counters that cooperatively specify field lengths and
represented integer values.
I've tried to read through the original algorithm description, but I'm
not entirely sure: How many payload bits per transmitted byte does it
actually achieve?
I don't think that payload bits per byte makes sense in this concept.
Correct. Presumably one means average payload bits per byte.
First one would have to define what the "standard" unencoded variable
length integer format was. Then one could call that size the payload
size. Then, in order to compute an average, one would have to specify
an expected, or target distribution of values. One then compares and
averages the payload size for each typical value with the encoded size.
--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list