Manning's SEQUITUR is one of those innovations that strikes me as fundamental. It's incredibly fast and low complexity for its effectiveness. This, to me, indicates nature probably somehow incorporates it as a neocortical hardware primitive. The important question I've not seen addressed in the grammar compression literature is the exact -- searching for a word -- "phase transition" boundary between SEQUITUR and backtracking.
On Thu, May 28, 2020 at 9:04 AM Matt Mahoney <[email protected]> wrote: > Your idea is similar to > Compression by induction of hierarchical grammars > by Craig G. Nevill-Manning, > Ian H . Witten & David L. Maulsby, > October, 1993 > > It looks promising but I don't know of any practical implementations. You > have to make multiple passes to find the most frequent pair of symbols, but > it is faster (but suboptimal) to find many frequent pairs at once in fewer > passes. > > On Thu, May 28, 2020, 8:58 AM stefan.reich.maker.of.eye via AGI < > [email protected]> wrote: > >> I think I got the theoretical runtime down to O(n) if we skip sorting the >> literal lines in the output file (which doesn't make any difference to the >> core idea). >> >> So the algorithm basically compresses a list of ints to a compact binary >> graph in O(n) time. It does so like this: >> >> while true: >> (i, j) = most often (and at least twice!) occurring adjacent pair in >> list; end loop if none found >> k = index of new or existing table entry (i, j) >> replace all occurrences of (i, j) in list with (k) >> >> Each step of the loop can be done in constant time by clever use of hash >> maps and linked lists. That almost sounds impossible, but there we are, >> just finished the code. In practical terms, for enwik8, it's just about as >> fast than the last version that was more O(n log n)-ish. >> >> I'm truly wondering if this algorithm has been discovered before. >> > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/Tb2cf064c700f181c-M8cdf4daa6a5627de0e2fb67e> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tb2cf064c700f181c-M920dc2a0fd0d32d63424f9fe Delivery options: https://agi.topicbox.com/groups/agi/subscription
