So my current code mixes 17 matches to compress 100MB to ~21.4MB. World record 
is 14.8MB. 17 as in: The cat ran, he cat ran, e cat ran,  cat ran, cat ran, at 
ran.... I see what letter (by frequency) comes NEXT after the context! This is 
Context Mixing and is SUPERIOR. But I only use EXACT matches ATM. Imagine if I 
get 300 matches? I can mix them all! But can my CPU handle it?

So if I have a window on text and get "the cat ran into the" I want to find not 
just exact matches in my tree/ hierarchy.

I want matches like the following, and want to know how similar they are too:
"the cat ran into the"
"the dog ran into the"
97.47% match

also positional similarity:
"the cat ran into the"
"cat the ran into the"
92.81% match

To do the cat=dog matching also itself, requires, positional shared contexts 
similarity.

I'm thinking I should first convert my window to dimensional word vectors ?
"the cat ran into the"
becomes:
"[4, 2, 7], [2, 9, 7], [1, 9, 8]...."

If I did, I could start at the root of mt Trie Tree and find the top 10 closest 
first word matches by dimensionality similarity, then check each of those paths 
top 10....bad on CPU....It's even worse when you look ahead naively and hope 
the word to match is in the future positionally.

How am I supposed to do this....

I may need to ignore letter level completely idk....
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T7676168d17901a9c-Me785cf6c1cf9118e6629d5d0
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to