[
https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891984#action_12891984
]
Michael McCandless commented on LUCENE-2553:
--------------------------------------------
The IndexReader/Searcher.document call, itself, isn't that performant,
regardless of whether you call it inside a custom Collector or outside. If you
need random-access to certain field(s) across all docs it's best to use
FieldCache.DEFAULT.getXXX instead.
> IOException: read past EOF
> --------------------------
>
> Key: LUCENE-2553
> URL: https://issues.apache.org/jira/browse/LUCENE-2553
> Project: Lucene - Java
> Issue Type: Bug
> Components: Search
> Affects Versions: 3.0.2
> Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> at
> com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> at
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> at org.apache.lucene.search.Searcher.search(Searcher.java:67)
> ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an
> unordered manner:
> {code}
> private class AllHitsUnsortedCollector extends Collector {
> private Log logger =
> LogFactory.getLog(AllHitsUnsortedCollector.class);
> private IndexReader reader;
> private int baselineDocumentId;
> private List<Document> matchingDocuments = new ArrayList<Document>();
>
> @Override
> public boolean acceptsDocsOutOfOrder() {
> return true;
> }
> @Override
> public void collect(int docId) throws IOException {
> int documentId = baselineDocumentId + docId;
> Document document = reader.document(documentId,
> getFieldSelector());
>
> if (document == null) {
> logger.info("Null document from search results!");
> } else {
> matchingDocuments.add(document);
> }
> }
> @Override
> public void setNextReader(IndexReader segmentReader, int baseDocId)
> throws IOException {
> this.reader = segmentReader;
> this.baselineDocumentId = baseDocId;
> }
> @Override
> public void setScorer(Scorer scorer) throws IOException {
> // do nothing
> }
> public List<Document> getMatchingDocuments() {
> return matchingDocuments;
> }
> }
> {code}
> The exception arises when users perform searches while indexing/optimization
> is occurring. Our {{IndexReader}} is read-only. From the documentation I have
> read, a read-only {{IndexReader}} instance should be immune from any
> uncommitted index changes and should return consistent results during
> indexing and optimization. As this exception occurs during
> indexing/optimization, it seems to me that the read-only {{IndexReader}} is
> somehow stumbling upon the uncommitted content?
> The problem is difficult to replicate as it is sporadic in nature and so far
> has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to
> alleviate the issue.
> Any other information I can provide that will help isolate the issue?
> The most likely other possibility is that the {{Collector}} we have written
> is doing something it shouldn't. Any pointers?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]