Pedro wrote at 04/18/2012 02:54 PM:
So to put it in a simple way, I need to tokenize all my data and
create an index which I load into memory...?

That's a simple way that might do everything you want.

If you do this, and then find you want it to work better, then I suggest hitting an IR textbook.

Regarding whether keeping everything in memory will work: You can do the arithmetic on how much memory you'll need, once you know how many terms and documents you need to support. Then see whether you'll have enough free RAM for twice that number; if you're exhausting RAM and swapping GC'd virtual memory to disk randomly, you're going to have a bad time.

Is this how it is usually done? For example, does my browser (firefox)
keep an index of all the words present in urls and page titles on
memory at any given time?

I would guess so, though that might be indirectly, such as through an SQLite cache.

Neil V.

--
http://www.neilvandyke.org/
____________________
 Racket Users list:
 http://lists.racket-lang.org/users

Reply via email to