Daniel Herlitz wrote:

Hi everybody,

We have been using Lucene for about one year now with great success. Recently though the index has growed noticably and so has the number of searches. I was wondering if anyone would like to comment on these figures and say if it works for them?

Index size: ~2.5 GB, on disk
Number of fields: ~30
Number of indexed fields: ~10
Server: Linux, Intel(R) Xeon(TM) CPU 3.00GHz, 3GB, dedicated to Lucene searches.
Java: Sun 1.5, -Xmx1200m

For perf tuning on 1.4+ VMs I always try these flags too:

-server
-XX:CompileThreshold=100
-Xverify:none

And also worth considering is giving a -Xms value equal to -Xmx.



Load: Approaching 2000 requests / hour.
Queries: The query strings are of highly differing complexity, from simple x:y to long queries involving conjunctions, disjunctions and wildecard queries.


90% of the queries run brilliantly. Problem is that 10% of the queries (simple or not) take a long time, on average more that 10 seconds, sometimes several minutes.

We have managed to track down these figures to the calls to IndexSearcher.search(Query). We have seen up to about 10 searches concurrently executing.

We have tried to run the server on different machines and with different version of Java. We have no OutOfMemorys.

I am curious about what to expect from Lucene when it comes to searching. There are lots of figures about the indexing speed (no question about that, it's incredibly fast!). But what about searching? And searching with the kind of load we have. Anyone in the same situation as we are? Comments? Suggestions?

Well in a benchmark I was doing recently fuzzy queries were the problem in the mix I had - but to be fair, a fuzzy search is really just a big query as it expands query to be all "similar" terms.


Also of interest is what's the problem w/ the long running queries - are they slowing down the response time for the other users w/ shorter queries?

I've never done this, but you could consider a thread pool to execute the queries, and once a query takes more than, say, a second, you lower its priority.

Also, I'd have a rule like no more than "n" slow queries can run at once, so you queue up slow queries if there are lots of them executing.




Thanks Daniel




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to