Hi Tushar, This is an interesting scenario!
The problem arises from the way search() methods that return Hits are working: for start only 100 matching documents are collected, assuming that apps calling this method will not be interested in more documents than this, and that apps traversing all matching documents (like yours) will use the HitCollector API and provide their HitCollector (your HitCollector would then do the deletion). Anyhow, if an application requests the 101 matching doc, under the hoods, the query is resubmitted, this time fetching 200 docs, out of which first 100 are ignored and the rest are provided as results. If more than 200 are needed the next re-submission would bring 400, then 800, etc. Now, in your interesting scenario, you deleted every retrieved doc. The sequence of resubmission of queries is: 100, 200, 400, 800, 1,600, 3,200, 6,400, 12,800 (actually 11,475). After first 6,400 were deleted and you ask for the result 6,401, the query is re-submitted, but only 11,475 - 6,400 = 5075 matches are found. Since you asked for the 6,401 match, Hits attempts to skip the first 6,400 and fails of course, because there are not that many docs. This seems like a bug, because although Hits is not recommended for this task, for performance considerations, and you should better use a HitCollector for this - still, this should have worked correctly. I tend to think that his should just be documented and not necessarily fixed, not 100% sure which of the two. Could you file a JIRA Lucene issue for this? Regards, Doron On Dec 19, 2007 12:10 PM, Tushar B <[EMAIL PROTECTED]> wrote: > Hello All, > > I am seeing this issue and would like to understand if its a bug or I am > missing something and doing the wrong way: > > (Note that I am doing all exception handling - but deleted the exception > handling code for sake of brevity below) > > Hits h = m_indexSearcher.search(q); // Returns 11475 documents > for(int i = 0; i < h.length(); i++) > { > int doc = h.id(i); > m_indexSearcher.getIndexReader().deleteDocument(doc); > } > > The above hits Vector::ArrayIndexOutOfBoundsException when i = 6400. The > problem happens in Hits::getMoreDocs. > > By the time 6400 docs are deleted, the majority is gone and > topDocs.totalHits becomes less than 6400 (In this case 5075) and finally > causes exception in the last line of Hits::hitDoc. > > I just took the example numbers which occured in my case but this happens > for any hits > 200 (initial vector size is 100 I guess). > > Any insight on the logic here will be very helpful (note: I have a > workaround too) > > thanks > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >