On 02/18/2015 02:55 PM, janhein.vanderb...@gmail.com wrote:
Op woensdag 18 februari 2015 17:47:49 UTC+1 schreef Dave Angel:
On 02/18/2015 03:59 AM, janhein.vanderb...@gmail.com wrote:
encoding individual integers optimally without any assumptions about their
values.
Contradiction in terms.
--
DaveA
Not.
Jan-Hein.
Then you had better define your new word "optimal" to us. I decided to
try your algorithm for all the values between 0 and 999999. One million
values, and the 7bit encoding[1] beat yours for 950081 of them. The
rest were tied. Yours never was shorter.
For a uniform distribution of those particular values, the average
number of bytes used by yours was 3.933568 bytes, and by 7bit encoding
was 2.983487
For the second and third million, yours are all 4 bytes, while 7bit uses
3. Beyond 2097152, 7bit uses 4 bytes, same as you. Between 16 and 17
million, you average 4.156865, while 7bit is a constant 4.0.
After that, I started spot-checking. I went up to 100 billion, and for
none of those I tried did your algorithm take fewer bytes than 7bit.
So how is yours optimal? Over what range of values?
I'm not necessarily doubting it, just challenging you to provide a data
sample that actually shows it. And of course, I'm not claiming that
7bit is in any way optimal. You cannot define optimal without first
defining the distribution.
[1] by 7bit, I'm referring to the one apparently used in MIDI encoding,
where 7 bits of each byte hold the value, and the 8th bit is zero,
except for the last byte, where the 8th bit is one. So 3 bytes can
encode 21 bits, or up to 2**21 - 1.
--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list