On 02/17/2015 09:58 AM, Chris Angelico wrote:
On Wed, Feb 18, 2015 at 1:50 AM, Dave Angel <da...@davea.name> wrote:
But the first thing I'd expect to see would be a target estimate of the
anticipated distribution of number values/magnitudes.  For example, if a
typical integer is 1500 bits, plus/minus 200 bits, I'd probably try encoding
in base 65535, and have a terminator character of 0xffff.

Sure. Not sure how you'd cope with an interior FFFF in the stream
without drastically losing efficiency, though.

That's why it was base 65535, not 65536.


An interesting point of history.  At the time of Huffman's paper, it was
provable that it was the best possible lossless compression.  But there were
some unwritten assumptions, such as that each character would be encoded
with a whole number of bits.  Changing that assumption makes arithmetic
coding possible.  Next assumption was that probabilities of each character
didn't depend on context.  Changing that assumption enables LZ.  And so on.

Didn't LZ predate arithmetic coding?

I really don't know. I looked up Huffman's article (before Internet, I found it in a library) from the 1952 IRE journal (I believe that's where it was). I probably read it about 1980.

But arithmetic coding and LZ both came later to me, and that's the order I saw them in.

But yes, removing those
assumptions has enabled some pretty amazing alternatives. Just
describing arithmetic coding to them is enough to blow most people's
minds. (Especially my sister. She hates infinity, mainly - I think -
because it doesn't fit inside her brain. I troll her occasionally with
Hilbert's Hotel or anything involving the notion of different sizes of
infinity.)


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to