On Wed, Apr 18, 2012 at 1:15 PM, Danny Yoo <d...@cs.wpi.edu> wrote: >> >> I think the subfield you're looking for is called "information retrieval", >> and there are textbooks on it. > > Managing Gigabytes, for example: > > http://ww2.cs.mu.oz.au/mg/
Another book that just came out that looks good is: Introduction to Information Retrieval: http://nlp.stanford.edu/IR-book/ It's awesome that they've put the book online. Aside: the suffix-tree approach I proposed earlier might be inappropriate for the task you're exploring, because they represent all suffixes of the text. Just a heads up to keep watch over memory usage, because suffix trees are heavy. ____________________ Racket Users list: http://lists.racket-lang.org/users