Hey.
I'm trying to figure out the FASTEST way to solve this problem.
We have a system where I'll be given 10 or 20 unique keys. Which are
stored as non-tokenized fields within Lucene. Each key represents a
unique document.
Internally I'm creating a new Term and then calling
IndexReader.termDocs() on this term. Then if termdocs.next() matches
then I'll return this document.
The problem is that this doesn't work very fast either. This is not an
academic debate as I've put the system in a profiler and Lucene is the
top bottleneck (by far).
I don't think there's anything faster than this right? Could I maybe
cache a TermEnum and keep it as a pointer to the FIRST field for these
IDs and then reuse that? This might allow me to search faster to the
start of my terms?
Does Lucene internally do a binary search for my term?
I could of course do an index merge of all this content but thats a
separate problem. We have a lot of indexes and often have more than 40
and constantly merging these into a multigig index just takes FOREVER.
It seems that internally IO is the problem. I'm about as fast on IO as I
can get as I'm on a SCSI RAID array at RAID0 on FAST scsi disks... I
also tried tweaking InputStream.BUFFER_SIZE with no visible change in
performance.
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]