Den onsdag 13 juli 2016 kl. 12:05:03 UTC+2 skrev jonas.t...@gmail.com: > Den onsdag 13 juli 2016 kl. 04:29:48 UTC+2 skrev Steven D'Aprano: > > On Wed, 13 Jul 2016 03:35 am, jonas.thornv...@gmail.com wrote: > > > > > No it is only compressible down to a limit given by the algorithm. > > > > Right! Then there is data that you can't compress. > > > > Suppose you have some data: > > > > data = "ABABABABABAB...ABAB" > > > > And you compress it "down to a limit": > > > > x = compress(compress(compress(data))) > > print(x) > > => prints "@nx7%k!b" > > > > Now let's try again with something else: > > > > data = "AABBBCCCCDDDDEEEE...ZZZZ" > > > > And you compress it "down to a limit": > > > > x = compress(compress(compress(compress(data)))) > > print(x) > > => prints "wu*$cS#k-pv32zx[&+r" > > > > > > One more time: > > > > data = "AABBAABBAABBAABBAABB" > > x = compress(data) > > print(x) > > => prints "g^x3@" > > > > > > We agree on this. Now you say, "Give me some random data, anything at all, > > and I'll compress it!", and I run a random number generator and out pops: > > > > data = "@nx7%k!b" > > > > or possibly: > > > > data = "wu*$cS#k-pv32zx[&+r" > > > > or: > > > > data = "g^x3@" > > > > > > and I say "Compress that!" > > > > But we've already agreed that this is as compressed as you can possibly make > > it. You can't compress it any more. > > > > So there's *at least some* random data that you can't compress. Surely you > > have to accept that. You don't get to say "Oh, I don't mean *that* data, I > > mean only data that I can compress". Random data means its random, you > > don't get to pick and choose between data you can compress and data that > > you can't. > > > > Now the tricky part is to realise that its not just short sequences of > > random data that can't be compressed. The same applies for LONG sequences > > to. If I give you a gigabyte of raw video, you can probably compress that a > > fair bit. That's what things like x264 encoders do. The x265 encoder is > > even better. But they're lossy, so you can't reverse them. > > > > But if I give you a gigabyte of random data, you'll be lucky to find *any* > > patterns or redundancies that allow compression. You might be able to > > shrink the file by a few KB. And if you take that already compressed file, > > and try to compress it again, well, you've already hit the limit of > > compression. There no more redundancy left to remove. > > > > It doesn't matter how clever you are, or what a "folding structure" is, or > > how many times you iterate over the data. It's a matter of absolute > > simplicity: the pigeonhole principle. You can't argue with the numbers. > > > > If you start with a 100 digit decimal number, there are 10**100 different > > pigeons. If you can compress down to a 6 digit decimal number, there are > > 10**6 pigeon holes. You cannot put 10*100 pigeons into 10**6 pigeon holes > > without doubling up (which makes your compression lossly). > > > > So either some numbers cannot be compressed, or some numbers are compressed > > to the same result, and you can't tell which was the original. That's your > > choice: a lossless encoder means some numbers can't be compressed, a lossy > > encoder means you can't reverse the process exactly. > > > > > > > > > > > > -- > > Steven > > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > > enough, things got worse. > > Ok, try to see it this way ****very big**** numbers can be described as the > sum or difference between a sequense of a few polynomials. Unfortunately we > lack the computational skill/computing power to find them. > > That is not the case using foldings/geometric series.
The sum or difference of a few small polynomials would still of course be a polynomial. But as i say it is a case of enrich the interpretation of the symbolic set that you look at you replace digits bits/integers with arithmetic describing them. -- https://mail.python.org/mailman/listinfo/python-list