Martijn van Groningen created LUCENE-7304:
---------------------------------------------
Summary: Doc values based block join implementation
Key: LUCENE-7304
URL: https://issues.apache.org/jira/browse/LUCENE-7304
Project: Lucene - Core
Issue Type: Improvement
Reporter: Martijn van Groningen
Priority: Minor
At query time the block join relies on a bitset for finding the previous parent
doc during advancing the doc id iterator. On large indices these bitsets can
consume large amounts of jvm heap space. Also typically due the nature how
these bitsets are set, the 'FixedBitSet' implementation is used.
The idea I had was to replace the bitset usage by a numeric doc values field
that stores offsets. Each child doc stores how many docids it is from its
parent doc and each parent stores how many docids it is apart from its first
child. At query time this information can be used to perform the block join.
I think another benefit of this approach is that external tools can now easily
determine if a doc is part of a block of documents and perhaps this also helps
index time sorting?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]