markharw00d wrote:
Chris Hostetter wrote:
this is something anyone using the Lucene API can do as long as they
use a
HitCollector ... the Nutch impl seems to ctually spin up a seperate
thread
I'm keen to understand the pros and cons of these two approaches.
With the HitCollector approach is this just engineering a fall at the
final hurdle? It could be that long running queries spend all their
time doing edit-distance comparisions for a a fuzzy boolean query,
say or reading TermDocs for a large range filter to create a BitSet
only to be aborted at the collection stage?
Another point - I noticed in some basic timing tests that calling
System.currentTimeMillis() in a tight loop like for *every* call to
HitCollector.collect(..) could add reasonable overhead so you probably
only want to call this for every nth document collected when testing
execution times.
That's why Nutch implementation doesn't do this (I know, I wrote it ;) ).
What it does is the following (please see the patch for details):
* it creates a single (static) timer thread, which counts the "ticks",
every couple hundred ms (configurable). It uses a volatile int counter,
therefore avoiding the need to synchronize.
* each HitColector records the start tick count in its constructor, and
then checks the current tick count in collect(...). If the difference is
too large then it throws a RuntimeException (NOTE: would someone
*please* refactor this API so that we can exit this loop more gracefully!).
This design has several benefits: it avoids creating too many timer
threads (there is just one per JVM), it avoids the need to synchronize
on the value being changed, and it avoids calling
System.currentTimeMillis().
Best regards,
Andrzej
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]