Den onsdag 13 juli 2016 kl. 04:29:48 UTC+2 skrev Steven D'Aprano: > On Wed, 13 Jul 2016 03:35 am, jonas.thornv...@gmail.com wrote: > > > No it is only compressible down to a limit given by the algorithm. > > Right! Then there is data that you can't compress. > > Suppose you have some data: > > data = "ABABABABABAB...ABAB" > > And you compress it "down to a limit": > > x = compress(compress(compress(data))) > print(x) > => prints "@nx7%k!b" > > Now let's try again with something else: > > data = "AABBBCCCCDDDDEEEE...ZZZZ" > > And you compress it "down to a limit": > > x = compress(compress(compress(compress(data)))) > print(x) > => prints "wu*$cS#k-pv32zx[&+r" > > > One more time: > > data = "AABBAABBAABBAABBAABB" > x = compress(data) > print(x) > => prints "g^x3@" > > > We agree on this. Now you say, "Give me some random data, anything at all, > and I'll compress it!", and I run a random number generator and out pops: > > data = "@nx7%k!b" > > or possibly: > > data = "wu*$cS#k-pv32zx[&+r" > > or: > > data = "g^x3@" > > > and I say "Compress that!" > > But we've already agreed that this is as compressed as you can possibly make > it. You can't compress it any more. > > So there's *at least some* random data that you can't compress. Surely you > have to accept that. You don't get to say "Oh, I don't mean *that* data, I > mean only data that I can compress". Random data means its random, you > don't get to pick and choose between data you can compress and data that > you can't. > > Now the tricky part is to realise that its not just short sequences of > random data that can't be compressed. The same applies for LONG sequences > to. If I give you a gigabyte of raw video, you can probably compress that a > fair bit. That's what things like x264 encoders do. The x265 encoder is > even better. But they're lossy, so you can't reverse them. > > But if I give you a gigabyte of random data, you'll be lucky to find *any* > patterns or redundancies that allow compression. You might be able to > shrink the file by a few KB. And if you take that already compressed file, > and try to compress it again, well, you've already hit the limit of > compression. There no more redundancy left to remove. > > It doesn't matter how clever you are, or what a "folding structure" is, or > how many times you iterate over the data. It's a matter of absolute > simplicity: the pigeonhole principle. You can't argue with the numbers. > > If you start with a 100 digit decimal number, there are 10**100 different > pigeons. If you can compress down to a 6 digit decimal number, there are > 10**6 pigeon holes. You cannot put 10*100 pigeons into 10**6 pigeon holes > without doubling up (which makes your compression lossly). > > So either some numbers cannot be compressed, or some numbers are compressed > to the same result, and you can't tell which was the original. That's your > choice: a lossless encoder means some numbers can't be compressed, a lossy > encoder means you can't reverse the process exactly. > > > > > > -- > Steven > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > enough, things got worse.
Ok, try to see it this way ****very big**** numbers can be described as the sum or difference between a sequense of a few polynomials. Unfortunately we lack the computational skill/computing power to find them. That is not the case using foldings/geometric series. -- https://mail.python.org/mailman/listinfo/python-list