markharw00d wrote:
Chris Hostetter wrote:
this is something anyone using the Lucene API can do as long as they use a HitCollector ... the Nutch impl seems to ctually spin up a seperate thread

I'm keen to understand the pros and cons of these two approaches.

With the HitCollector approach is this just engineering a fall at the final hurdle? It could be that long running queries spend all their time doing edit-distance comparisions for a a fuzzy boolean query, say or reading TermDocs for a large range filter to create a BitSet only to be aborted at the collection stage? Another point - I noticed in some basic timing tests that calling System.currentTimeMillis() in a tight loop like for *every* call to HitCollector.collect(..) could add reasonable overhead so you probably only want to call this for every nth document collected when testing execution times.

That's why Nutch implementation doesn't do this (I know, I wrote it ;) ).

What it does is the following (please see the patch for details):

* it creates a single (static) timer thread, which counts the "ticks", every couple hundred ms (configurable). It uses a volatile int counter, therefore avoiding the need to synchronize.

* each HitColector records the start tick count in its constructor, and then checks the current tick count in collect(...). If the difference is too large then it throws a RuntimeException (NOTE: would someone *please* refactor this API so that we can exit this loop more gracefully!).

This design has several benefits: it avoids creating too many timer threads (there is just one per JVM), it avoids the need to synchronize on the value being changed, and it avoids calling System.currentTimeMillis().

Best regards,
Andrzej

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to