Fastest way to fetch N documents with unique keys within large numbers of indexes..

Kevin Burton Mon, 06 Jun 2005 18:41:19 -0700

Hey.

I'm trying to figure out the FASTEST way to solve this problem.

We have a system where I'll be given 10 or 20 unique keys. Which arestored as non-tokenized fields within Lucene. Each key represents aunique document.

Internally I'm creating a new Term and then callingIndexReader.termDocs() on this term. Then if termdocs.next() matchesthen I'll return this document.

The problem is that this doesn't work very fast either. This is not anacademic debate as I've put the system in a profiler and Lucene is thetop bottleneck (by far).

I don't think there's anything faster than this right? Could I maybecache a TermEnum and keep it as a pointer to the FIRST field for theseIDs and then reuse that? This might allow me to search faster to thestart of my terms?


Does Lucene internally do a binary search for my term?

I could of course do an index merge of all this content but thats aseparate problem. We have a lot of indexes and often have more than 40and constantly merging these into a multigig index just takes FOREVER.

It seems that internally IO is the problem. I'm about as fast on IO as Ican get as I'm on a SCSI RAID array at RAID0 on FAST scsi disks... Ialso tried tweaking InputStream.BUFFER_SIZE with no visible change inperformance.


Kevin

--

Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.See irc.freenode.net #rojo if you want to chat.


Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

  Kevin A. Burton, Location - San Francisco, CA
     AIM/YIM - sfburtonator,  Web - http://peerfear.org/

GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Fastest way to fetch N documents with unique keys within large numbers of indexes..

Reply via email to