On Wed, Apr 18, 2012 at 1:15 PM, Danny Yoo <d...@cs.wpi.edu> wrote:
>>
>> I think the subfield you're looking for is called "information retrieval",
>> and there are textbooks on it.
>
> Managing Gigabytes, for example:
>
>    http://ww2.cs.mu.oz.au/mg/


Another book that just came out that looks good is: Introduction to
Information Retrieval:

    http://nlp.stanford.edu/IR-book/

It's awesome that they've put the book online.


Aside: the suffix-tree approach I proposed earlier might be
inappropriate for the task you're exploring, because they represent
all suffixes of the text.  Just a heads up to keep watch over memory
usage, because suffix trees are heavy.

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to