I'll take your word for it, though it seems odd. I'm wondering
if there's anything you can do to pre-process the documents
at index time to make the post-processing less painful, but
that's a wild shot in the dark...
Another possibility would be to fetch only the fields you need
to do the post-pro
Erick,
Thanks for your reply! You are probably right to question how many
Documents we are retrieving. We know it isn't best, but significantly
reducing that number will require us to completely rebuild our system.
Before we do that, we were just wondering if there was anything in the
Lucene API o
I call into question why you "retrieve and materialize as
many as 3,000 Documents from each index in order to
display a page of results to the user". You have to be
doing some post-processing because displaying
12,000 documents to the user is completely useless.
I wonder if this is an "XY" problem
Is each index optimized?
>From my vague grasp of Lucene file formats, I think you want to sort
the documents by segment document id, which is the order of documents
on the disk. This lets you materialize documents in their order on the
disk.
Solr (and other apps) generally use a separate thread p
Michael,
from a physical point of view, it would seem like the order in which the
documents are read is very significant for the reading speed (feel the random
access jump as being the issue).
You could:
- move to ram-disk or ssd to make a difference?
- use something different than a searcher w
Hi All,
I am running Lucene 3.4 in an application that indexes about 1 billion
factual assertions (Documents) from the web over four separate disks, so
that each disk has a separate index of about 250 million documents. The
Documents are relatively small, less than 1KB each. These indexes provide