Thank you Stuart. I got it working with:
// sort by docids Arrays.sort(scoreDocs, new Comparator<ScoreDoc>() { @Override public int compare(ScoreDoc o1, ScoreDoc o2) { return Integer.compare(o1.doc, o2.doc); } }); On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <stuart.r...@pnnl.gov> wrote: > Hi Vijay, > > ...sorting the documents you need to retrieve by docID order first... > > means sorting them by their 'document number' which is the value in the > 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' > the document from the index. If you write a comparator to sort the elements > in the ScoreDoc[] by their doc field then that will put them in 'docID > order' and the reader will always be skipping forward to the next doc which > will probably reduce its seek time. > > Regards, > Stuart > > > > -----Original Message----- > From: Vijay B [mailto:vijay.nip...@gmail.com] > Sent: Monday, November 17, 2014 9:16 AM > To: java-user@lucene.apache.org > Subject: Order docIds to reduce disk seeks > > *Could someone point me how to order docIds as per ** > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>* > > *"Limit usage of stored fields and term vectors. Retrieving these from the > index is quite costly. Typically you should only retrieve these for the > current "page" the user will see, not for all documents in the full result > set. For each document retrieved, Lucene must seek to a different location > in various files. Try sorting the documents you need to retrieve by docID > order first."* > > *To give some background:* > > *We are using plain vanilla LUCNE (version 4.2.1) for our **Our > application.**We index our documents using stored fields. We add two fields > related to our documents: UUID: 9 digit number represents internal id and > doc_text: document text( 7k to 20K in size approx). In our search code, > **we use boolean Query to retrive by UUID and fetch document text use if > for other processing. We are noticing slow response times with the > searches. I understand that stored field retrieval are slower and should be > limited but this is mandatory for our app.* > > > Current code: > > TopScoreDocCollector collector = > TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true); > > dirReader = DirectoryReader.open(FSDirectory.open(......)) > IndexSearcher indexSearcher = new IndexSearcher(dirReader); > indexSearcher.search(query, collector); ScoreDoc[] scoreDocs = > collector.topDocs().scoreDocs; > > for (ScoreDoc scoreDoc : scoreDocs) { > Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text = > luceneDoc.get("doc_text"); //these calls take lot of time > > //process text > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >