I think I got the theoretical runtime down to O(n) if we skip sorting the 
literal lines in the output file (which doesn't make any difference to the core 
idea).

So the algorithm basically compresses a list of ints to a compact binary graph 
in O(n) time. It does so like this:

while true:
  (i, j) = most often (and at least twice!) occurring adjacent pair in list; 
end loop if none found
  k = index of new or existing table entry (i, j)
  replace all occurrences of (i, j) in list with (k)

Each step of the loop can be done in constant time by clever use of hash maps 
and linked lists. That almost sounds impossible, but there we are, just 
finished the code. In practical terms, for enwik8, it's just about as fast than 
the last version that was more O(n log n)-ish.

I'm truly wondering if this algorithm has been discovered before.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb2cf064c700f181c-Ma69c283a13a3bf51eb532bef
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to