On Wed, 13 Jul 2016 03:35 am, jonas.thornv...@gmail.com wrote: > No it is only compressible down to a limit given by the algorithm.
Right! Then there is data that you can't compress. Suppose you have some data: data = "ABABABABABAB...ABAB" And you compress it "down to a limit": x = compress(compress(compress(data))) print(x) => prints "@nx7%k!b" Now let's try again with something else: data = "AABBBCCCCDDDDEEEE...ZZZZ" And you compress it "down to a limit": x = compress(compress(compress(compress(data)))) print(x) => prints "wu*$cS#k-pv32zx[&+r" One more time: data = "AABBAABBAABBAABBAABB" x = compress(data) print(x) => prints "g^x3@" We agree on this. Now you say, "Give me some random data, anything at all, and I'll compress it!", and I run a random number generator and out pops: data = "@nx7%k!b" or possibly: data = "wu*$cS#k-pv32zx[&+r" or: data = "g^x3@" and I say "Compress that!" But we've already agreed that this is as compressed as you can possibly make it. You can't compress it any more. So there's *at least some* random data that you can't compress. Surely you have to accept that. You don't get to say "Oh, I don't mean *that* data, I mean only data that I can compress". Random data means its random, you don't get to pick and choose between data you can compress and data that you can't. Now the tricky part is to realise that its not just short sequences of random data that can't be compressed. The same applies for LONG sequences to. If I give you a gigabyte of raw video, you can probably compress that a fair bit. That's what things like x264 encoders do. The x265 encoder is even better. But they're lossy, so you can't reverse them. But if I give you a gigabyte of random data, you'll be lucky to find *any* patterns or redundancies that allow compression. You might be able to shrink the file by a few KB. And if you take that already compressed file, and try to compress it again, well, you've already hit the limit of compression. There no more redundancy left to remove. It doesn't matter how clever you are, or what a "folding structure" is, or how many times you iterate over the data. It's a matter of absolute simplicity: the pigeonhole principle. You can't argue with the numbers. If you start with a 100 digit decimal number, there are 10**100 different pigeons. If you can compress down to a 6 digit decimal number, there are 10**6 pigeon holes. You cannot put 10*100 pigeons into 10**6 pigeon holes without doubling up (which makes your compression lossly). So either some numbers cannot be compressed, or some numbers are compressed to the same result, and you can't tell which was the original. That's your choice: a lossless encoder means some numbers can't be compressed, a lossy encoder means you can't reverse the process exactly. -- Steven “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list