Dan Nelson wrote:

You need to walk the entire index to make sure you have all the values.
There might be a single "AAB" inbetween those million "AAA"'s and
million "BBB"'s.

Another DBA and I once discussed that an index of index values would be helpful for such large searches as web search engines (for example). An index would list all the words that are indexed with their offsets within the index itself, and then those offsets would contain the locations in the document for that word; if you needed to find AAB, its the second entry (in alpha order) in the word index, and its list of positions within the document is the 10234th entry (or byte position) within the index file. To know how many entries, one would simply grab the next index item (AAC in this case) and subtract (10235 - 10234 = 1 entry for AAB).
AAA -> 1
AAB -> 10234
AAC -> 10235

1: 12
2: 25
[...]
10233: 4285
10234: 73
10235: 4123

To top that off, finding closest matches to AAA with relation to AAC in a sentence (for example) would be simple as you can walk the index for AAA and AAC at the same time (since you know where both start in the index very quickly) and simply increment each according to the diff. between the position offsets in each (which are sorted in position order).

Just thinking out-loud, and no, I've never benchmarked it but I played with the idea in Python a few times as a proof-of-concept.

--
Michael T. Babcock
C.T.O., FibreSpeed Ltd. SQL
http://www.fibrespeed.net/~mbabcock



---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



Reply via email to