Re: [agi] MindForth Programming Journal 2019-10-05

Matt Mahoney Sun, 06 Oct 2019 17:06:44 -0700

I assume you are talking about the Hutter prize for compressing a 100 MB
text file. http://prize.hutter1.net/


I suggest reading
http://mattmahoney.net/dc/dce.html
to understand why your method won't work. Sure, you can find the file
embedded in the digits of pi or some other big number, but you need 100 MB
just to record the starting point. You're proposing a universal compression
algorithm, which is impossible.

Marcus Hutter, James Bowery, and I also set the rules so that you can't
cheat by hiding data in the decompressor.

The best text compressors model the lexical, semantic, and syntactic
structure of natural language. The whole point is to encourage AI research.
My large text benchmark rationale
http://mattmahoney.net/dc/rationale.html explains why text compression is
an AI problem. A language model that can predict text as well as a human,
as measured by compression, can pass the Turing test with a simple
modification.

On Sun, Oct 6, 2019, 7:01 PM <[email protected]> wrote:

> I was working on the compression prize last month ago or so for many
> reasons. I failed every time but I learned a lot more about trees, bin
> files, etc. Every time I'd come up with a unique solution that's innovative
> but however was not verified to do the job well enough
>
> One idea was that 9^9^9^9^9 creates a massive number in binary. Using just
> a few bits. It's most of the file. You can generate a huge movie - at least
> a few thousand for sure out of all possible movies. Better yet, you can
> manipulate it with high precision ex. 9^9^9+1 changes the binary data by 1
> single bit increase. You could attempt to recreate small parts of the
> binary stream like this, instead of 9^9 for the WHOLE bloody file.
>
> One idea was a binary tree - stick parts of the 100MB binary stream into a
> binary tree and remember the order the branches lay in. Both "10100001" and
> "101110101" share the first 3 bits. Binary is already storing massive
> numbers. So this idea topped the cake, it stores binary code in a yet
> better code.
>
> One idea was  to store the binary bit length (or word length), the dict of
> all words, and add word by word based on a range working through all
> possible files. Last I remember this would have worked if there wasn't SO
> much unstructured crap in the 100MB :-)
>
> Another idea I was just pondering is taking the smallest text file,
> smallest compressor, and finding the maximal compression, but slightly
> larger files each time. You could possibly learn what max compression looks
> like.
> Also, the decompressor and compressed txt file are smallest when both are
> smaller in size - the decompressor could have ALL the data in it and cheat
> while the compressed file has 0 bits,   therefore if both are evenly sized
> both might be smallest in size! Like this:
>
> A=compressed file size
> B=decompressor file size
>
> ----------
> N/A
>
> or
>
> N/A
> ----------
>
> or
>
> --
> --
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T2d0576044f01b0b1-Mf3772d7f6160b16a4d6ecb65>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2d0576044f01b0b1-Ma1fde5a06b468e8fbf2172fc
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] MindForth Programming Journal 2019-10-05

Reply via email to