Re: posting list traversal code

Denis Bazhenov Wed, 12 Jun 2013 23:25:16 -0700

Document id on the index level is offset of the document in the index. It can 
change over time for the same document, for example when merging several 
segments. They are also stored in order in posting lists. This allows fast 
posting list intersection. Some Lucene API's explicitly state that they operate 
on the document ids in order (like TermDocs), some allows out of order 
processing (like Collector). So it really depends.


In case of SortingAtomicReader, as far as I know, it calculate document 
permutation, which allows to have sorted docIDs on the output. So, it basically 
relabel documents.

On Jun 13, 2013, at 4:38 PM, Sriram Sankar <san...@gmail.com> wrote:

> Thanks Denis.  I've been looking at the code in more detail now.  I'm
> interested in how the new SortingAtomicReader works.  Suppose I build an
> index and sort the documents using my own sorting function - as shown in
> the docs:
> 
> AtomicReader sortingReader = new SortingAtomicReader(reader, sorter);
> 
> writer.addIndexes(sortingReader);
> 
> When the docs are sorted using my function, I assume the docids are not
> going to be in order any more?  Unless the docids change to maintain the
> sorted order.
> 
> If you look at the code in (for example) ConjunctionScorer.doNext(doc),
> what is the "doc" that gets used here?  If it is the docid (and they are
> out of order), this method will not work.  So either the docids have to be
> in order, or the "doc" here is some other number that defines the position
> of the document in the posting list.
> 
> I'm trying to read the code to understand this - I'd really appreciate
> someone with more indepth knowledge of this explaining this and also
> pointing me to somewhere in the code where the magic happens.
> 
> Thanks,
> 
> Sriram.
> 
> 
> 
> 
> On Wed, Jun 12, 2013 at 9:33 PM, Denis Bazhenov <dot...@gmail.com> wrote:
> 
>> I'm not quite sure, what you really need. But as far as I understand, you
>> want to get all document id's for a given term. If so, the following code
>> will work for you:
>> 
>> Term term = new Term("fieldName", "fieldValue");
>> TermDocs termDocs = indexReader.termDocs(term);
>> while (termDocs.next()) {
>>        int docId = termDocs.doc();
>>        // work with the document...
>> }
>> On Jun 13, 2013, at 1:56 PM, Sriram Sankar <san...@gmail.com> wrote:
>> 
>>> Can someone point me to the code that traverses the posting lists?  I
>>> trying to understand how it works.
>>> 
>>> Thanks,
>>> 
>>> Sriram
>> 
>> ---
>> Denis Bazhenov <dot...@gmail.com>
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 

---
Denis Bazhenov <dot...@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: posting list traversal code

Reply via email to