[EMAIL PROTECTED] writes: > > ...but it looks a little more akin to Solr than to Lucene. ... > > I'm not sure but I think nucular has aspects of both since > it implements both the search engine itself and also provides > XML and HTTP interfaces
That sounds reasonable. > As a test I built an index with 10's of millions of entries > using nucular and most queries through CGI processes clocked > in in 100's of milliseconds or better -- which is quite acceptable, > for many purposes. How many items did each query return? When I refer to large result sets, I mean you often get queries that return 10k items or more (a pretty small number: typing "python" into google gets almost 30 million hits) and you need to actually examine each item, as opposed to displaying ten at a time or something like that (e.g. you want to present faceted results). > > So we're back to the perennial topic of parallelism in Python... > > ...Which is not such a big problem if you rely on disk caching > to provide the RAM access and use multiple processes to access > the indices. Right, another helpful strategy might be to use a solid state disk: http://www.newegg.com/Product/Product.aspx?Item=N82E16820147021 -- http://mail.python.org/mailman/listinfo/python-list