gf2121 commented on code in PR #14494:
URL: https://github.com/apache/lucene/pull/14494#discussion_r2074987659
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/TrieReader.java:
##########
@@ -74,14 +77,39 @@ IndexInput floorData(TrieReader r) throws IOException {
final RandomAccessInput access;
final IndexInput input;
final Node root;
+ final int[] labelMap;
- TrieReader(IndexInput input, long rootFP) throws IOException {
+ static IOSupplier<TrieReader> readerSupplier(DataInput metaIn, IndexInput
indexIn)
+ throws IOException {
+ int[] labelMap = TrieReader.labelMap(metaIn);
+ long start = metaIn.readVLong();
+ long rootFP = metaIn.readVLong();
+ long end = metaIn.readVLong();
+ return () -> new TrieReader(indexIn.slice("outputs", start, end - start),
rootFP, labelMap);
+ }
+
+ private TrieReader(IndexInput input, long rootFP, int[] labelMap) throws
IOException {
this.access = input.randomAccessSlice(0, input.length());
+ this.labelMap = labelMap;
this.input = input;
this.root = new Node();
load(root, rootFP);
}
+ private static int[] labelMap(DataInput in) throws IOException {
+ int cnt = in.readVInt();
+ if (cnt == 0) {
+ return null;
+ } else {
+ int[] labelMap = new int[TrieBuilder.BYTE_RANGE];
Review Comment:
For now, we need a value, like `-1` to represent 'this label does not exist
in this trie'. So it can not be simply replaced by `byte[]`.
I personally think 256 * 4 = 1KB heap per field is OK. But we can reduce the
heap usage in cost of looking up overhead, like a `bitset` representing whether
the value exists, and a `byte[]` to map values. I can make the change if you
think this is worth :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]