On Mon, 23 Oct 2017 08:22 am, danceswithnumb...@gmail.com wrote: > In Short, I cannot find a single mathematical proof that says you cannot > compress random numbers.
Three seconds of googling finds this: http://www.faqs.org/faqs/compression-faq/part1/section-8.html which demonstrates a simple, trivial proof. It took me a little longer to find this: https://en.wikipedia.org/wiki/Kolmogorov_complexity#Compression but only because I couldn't remember how to spell Kolmogorov, and that lead me to this: https://en.wikipedia.org/wiki/Incompressible_string for a concrete example. Of course *occasionally* one will, by pure luck, compress a stream of random bits. I run my random number generator and by a fluke it generates a stream of 50 zeroes in a row: 00000000000000000000000000000000000000000000000000 which even a simple run-length encoding scheme can compress: (50*0) But flukes are not what people mean when they talk about compressing random data. They mean some algorithm which can: - transparently (no hiding data in the decompressor, or the file name, or in a secret file hidden on the disk, or anywhere else) - reversibly (there must be a decompressor which when given ONLY the compressed file, and NOTHING else, reconstruct the original) - losslessly (no throwing away data: we must reconstruct the EXACT original, not merely something "close enough") - and *consistently* (flukes don't count: every input must be compressed) compress random data of size N bits to at most N-1 bits. We can prove that this is impossible by simply *counting* how many random strings there are. Let us pick a concrete example, N=8. Then there are 2**8 = 256 possible strings. We need to compress down to at most 7 bits. If we allow the output to vary in length, there are: 2**7 = 128 seven bit strings; 2**6 = 64 six bit strings; 2**5 = 32 five bit strings; 2**4 = 16 four bit strings; 2**3 = 8 three bit strings; 2**2 = 4 two bit strings; 2**1 = 2 one bit strings; 2**0 = 1 zero bit strings which add up to 255 compressed strings in total. But there are 256 possible inputs, and only 255 possible outputs: so at least one input cannot be compressed, or the compression must be lossy (two inputs compress to the same output). By the way: here is a very clever trick for hiding information in the file system: http://www.patrickcraig.co.uk/other/compression.php but as people point out, the information in the file, plus the information in the file system, ends up being *larger* than the original. Well done to Patrick Craig for finding a clever loophole to "win" the challenge, but he did so without actually compressing the original data. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list