[ 
https://issues.apache.org/jira/browse/LUCENE-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7304:
---------------------------------
    Attachment: LUCENE-7304-20160606.patch

Patch of 6 June 2016.
This is the EliasFano code from  LUCENE-5627 put into core.

This has EliasFanoSequence implemented as EliasFanoBytes and as EliasFanoLongs, 
and an encoder and a decoder for these.

The EliasFanoDocIdSet uses an EliasFanoLongs except when it is dense, in that 
case it uses a FixedBitSet.

I added a getBitSet() method in this EliasFanoDocIdSet.

I also added the test cases from LUCENE-5627, but I did not add a test for the 
getBitSet() method yet. It works as a DocIdSet, so as a BitSet should be no 
problem.

EliasFanoDocIdSet could also be implemented on EliasFanoBytes, and it should be 
doable to put that in an index. At LUCENE-5627 EliasFanoBytes is used as a 
Payload already.


> Doc values based block join implementation
> ------------------------------------------
>
>                 Key: LUCENE-7304
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7304
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Priority: Minor
>         Attachments: LUCENE-5092-20140313.patch, LUCENE-7304-20160531.patch, 
> LUCENE-7304-20160606.patch, LUCENE_7304.patch
>
>
> At query time the block join relies on a bitset for finding the previous 
> parent doc during advancing the doc id iterator. On large indices these 
> bitsets can consume large amounts of jvm heap space.  Also typically due the 
> nature how these bitsets are set, the 'FixedBitSet' implementation is used.
> The idea I had was to replace the bitset usage by a numeric doc values field 
> that stores offsets. Each child doc stores how many docids it is from its 
> parent doc and each parent stores how many docids it is apart from its first 
> child. At query time this information can be used to perform the block join.
> I think another benefit of this approach is that external tools can now 
> easily determine if a doc is part of a block of documents and perhaps this 
> also helps index time sorting?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to