17 mar 2007 kl. 10.07 skrev markharw00d:
Chris Hostetter wrote:
this is something anyone using the Lucene API can do as long as
they use a
HitCollector ... the Nutch impl seems to ctually spin up a
seperate thread
I'm keen to understand the pros and cons of these two approaches.
With the HitCollector approach is this just engineering a fall at
the final hurdle? It could be that long running queries spend all
their time doing edit-distance comparisions for a a fuzzy boolean
query, say or reading TermDocs for a large range filter to create
a BitSet only to be aborted at the collection stage?
Another point - I noticed in some basic timing tests that calling
System.currentTimeMillis() in a tight loop like for *every* call to
HitCollector.collect(..) could add reasonable overhead so you
probably only want to call this for every nth document collected
when testing execution times.
I'd be on the look-out for complex queries that yeild none or very
few results. If not running in an own thread, time out might not be
triggered in reasonble time.
My guess is that most environments running of a J2SE JVM have no
problem with twice as many threads. Given there is no extent use of
the memory on the stack (serializing huge object graphs,
introspection/reflection, et c) one should be able to optimize memory
usage with -Xss.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]