[EMAIL PROTECTED] writes: > http://www.xfeedme.com/nucular/gut.py/go?FREETEXT=w > (w for "web") we get 6294 entries which takes about 500ms on > a cold index and about 150ms on a warm index. This is on a very > active shared hosting machine.
That's reasonable speed, but is that just to do the set intersections and return the size of the result set, or does it retrieve the actual result set? It only showed 20 results on a page. I notice that each book in the result list has an ID number. Say those are stored fields in Nucular: how long does it take to add up all the ID numbers for the results of that query? I.e. the requirement is to actually access every single record in order to compute the sum. This is similar to what happens with faceting. > You are right that you might want to > use more in-process memory for a really smart, multi-faceted relevance > ordering or whatever, but you have to be willing to pay for it > in terms of system resources, config/development time, etcetera. > If you want cheap and easy, nucular might be good enough, afaik. I used a cave-man approach with solr, which is I have an external process keeping the indexes warm by simply reading something from each page a few times an hour. That is enough to pull 10k or so results a second from a query. Without the warming, getting that many results takes over a minute. I do think much better approaches are possible and solr/lucene is by no means the be-all and end-all. I don't know if solr is using mmap or actual seek system calls underneath. > Regarding the 30 million number -- I bet google does > estimations and culling of some kind (not really looking at all 10M). Probably. > I'm not interested in really addressing the "google" size of data set > at the moment. Right, me neither, but a few 10's of GB of indexes is not all that large these days. > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820147021 > holy rusty metal batman! way-cool! Heh, check out the benchmark graphs: http://www.tomshardware.com/2006/09/20/conventional_hard_drive_obsoletism/page7.html -- http://mail.python.org/mailman/listinfo/python-list