Sekhar created LUCENE-6322:
------------------------------

             Summary: IndexSearcher.doc(int docID, SetfieldsToLoad)  is slower 
in Lucene 4.9 when compared to Lucene 2.9
                 Key: LUCENE-6322
                 URL: https://issues.apache.org/jira/browse/LUCENE-6322
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/codecs
    Affects Versions: 4.9
         Environment: Windows, JDK 7/8
            Reporter: Sekhar
             Fix For: 4.10.x


We use IndexSearcher.doc(int docID, SetfieldsToLoad) method to get the document 
with selected stored fields. If we did not mention few stored fields which have 
data more than 500KB, this call is slower in Lucene 4.9 when compared to Lucene 
2.9.

I debugged the above method with Lucene 4.9 and found that 
CompressingStoredFieldsReader#visitDocument(int docID, StoredFieldVisitor 
visitor) is spending more time while loading file content and decompressing in 
chunks of 16kb, even to skip the fields. It is noticeable degrade if the 
document's field size is more than 1MB, and we call this method in loop for 
more than 1000 such documents.

In case of Lucene 2.9, there was no compression, and if we want to skip the 
field, it just does file seek to set the next pointer to read the stored field. 
For example see Lucene3xStoredFieldsReader#skipField() method how it works for 
skipping a field in Lucene 2.9 which is VERY faster compared to Lucene 4.9.

We should have something in CompressingStoredFieldsReader to know the field’s 
compressed length in file and just do the file seek to set the next pointer 
instead of loading content from file and decompress that in 16KB chunks to just 
skip the field from the file.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to