gsmiller commented on code in PR #13658:
URL: https://github.com/apache/lucene/pull/13658#discussion_r1718827005
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java:
##########
@@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws
IOException {
assert or != 0;
final int bitsPerValue = PackedInts.bitsRequired(or);
out.writeByte((byte) bitsPerValue);
- forUtil.encode(longs, bitsPerValue, out);
+ encodeDeltas(longs, bitsPerValue, out);
+ }
+ }
+
+ private void encodeDeltas(long[] longs, int bitsPerValue, DataOutput out)
throws IOException {
+ final int nextPrimitive;
+ final int numLongs;
+ if (bitsPerValue <= 4) {
+ nextPrimitive = 8;
+ numLongs = BLOCK_SIZE / 8;
+ collapse8(longs);
+ } else if (bitsPerValue <= 11) {
+ nextPrimitive = 16;
+ numLongs = BLOCK_SIZE / 4;
+ collapse16(longs);
+ } else {
+ nextPrimitive = 32;
+ numLongs = BLOCK_SIZE / 2;
+ collapse32(longs);
+ }
+
+ final int numLongsPerShift = bitsPerValue * 2;
+ int idx = 0;
+ int shift = nextPrimitive - bitsPerValue;
+ for (int i = 0; i < numLongsPerShift; ++i) {
+ tmp[i] = longs[idx++] << shift;
+ }
+ for (shift = shift - bitsPerValue; shift >= 0; shift -= bitsPerValue) {
+ for (int i = 0; i < numLongsPerShift; ++i) {
+ tmp[i] |= longs[idx++] << shift;
+ }
+ }
+
+ final int remainingBitsPerLong = shift + bitsPerValue;
+ final long maskRemainingBitsPerLong;
+ if (nextPrimitive == 8) {
+ maskRemainingBitsPerLong = MASKS8[remainingBitsPerLong];
+ } else if (nextPrimitive == 16) {
+ maskRemainingBitsPerLong = MASKS16[remainingBitsPerLong];
+ } else {
+ maskRemainingBitsPerLong = MASKS32[remainingBitsPerLong];
+ }
+
+ int tmpIdx = 0;
+ int remainingBitsPerValue = bitsPerValue;
+ while (idx < numLongs) {
+ if (remainingBitsPerValue >= remainingBitsPerLong) {
+ remainingBitsPerValue -= remainingBitsPerLong;
+ tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) &
maskRemainingBitsPerLong;
+ if (remainingBitsPerValue == 0) {
+ idx++;
+ remainingBitsPerValue = bitsPerValue;
+ }
+ } else {
+ final long mask1, mask2;
+ if (nextPrimitive == 8) {
+ mask1 = MASKS8[remainingBitsPerValue];
+ mask2 = MASKS8[remainingBitsPerLong - remainingBitsPerValue];
+ } else if (nextPrimitive == 16) {
+ mask1 = MASKS16[remainingBitsPerValue];
+ mask2 = MASKS16[remainingBitsPerLong - remainingBitsPerValue];
+ } else {
+ mask1 = MASKS32[remainingBitsPerValue];
+ mask2 = MASKS32[remainingBitsPerLong - remainingBitsPerValue];
+ }
+ tmp[tmpIdx] |= (longs[idx++] & mask1) << (remainingBitsPerLong -
remainingBitsPerValue);
+ remainingBitsPerValue = bitsPerValue - remainingBitsPerLong +
remainingBitsPerValue;
+ tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) & mask2;
+ }
+ }
+
+ for (int i = 0; i < numLongsPerShift; ++i) {
+ out.writeLong(tmp[i]);
+ }
+ }
+
+ /** Number of bytes required to encode 128 integers of {@code bitsPerValue}
bits per value. */
+ int numBytes(int bitsPerValue) {
+ return bitsPerValue << (BLOCK_SIZE_LOG2 - 3);
+ }
+
+ private static void decodeSlow(
Review Comment:
And one more method that I think is duplicated from `ForUtil`?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java:
##########
@@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws
IOException {
assert or != 0;
final int bitsPerValue = PackedInts.bitsRequired(or);
out.writeByte((byte) bitsPerValue);
- forUtil.encode(longs, bitsPerValue, out);
+ encodeDeltas(longs, bitsPerValue, out);
+ }
+ }
+
+ private void encodeDeltas(long[] longs, int bitsPerValue, DataOutput out)
throws IOException {
Review Comment:
I think this is identical to `ForDelta#encode` right? Can we leverage that
instead of duplicating it? I guess the issue is that `ForUtil` relies on its
`tmp` buffer so the method isn't static, but we could split out a helper static
method that takes the buffer as input that could be common I think.
##########
lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultPostingDecodingUtil.java:
##########
@@ -29,13 +29,14 @@ public DefaultPostingDecodingUtil(IndexInput in) {
@Override
public void splitLongs(
Review Comment:
What if we add the "slow" implementation as a static method in
`PostingDecodingUtil` (as sketched out below) and then delegate to it here and
in the "slow path" in the vectorized implementation? That we can avoid the code
duplication.
```
protected static void splitLongsSlow(IndexInput in, int count, long[] b,
int bShift, int dec, long bMask, long[] c, int cIndex, long cMask) throws
IOException {
in.readLongs(c, cIndex, count);
// The below loop is auto-vectorized
for (int i = 0; i < count; ++i) {
for (int j = 0, end = (bShift - 1) / dec; j <= end; ++j) {
b[count * j + i] = (c[cIndex + i] >>> (bShift - j * dec)) & bMask;
}
// can we determine cIndex as `count * ((bShift - 1) / dec + 1)`?
c[cIndex + i] &= cMask;
}
}
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##########
@@ -712,14 +465,14 @@ private static void decode7(IndexInput in,
PostingDecodingUtil pdu, long[] tmp,
}
}
- private static void decode8(IndexInput in, PostingDecodingUtil pdu, long[]
tmp, long[] longs)
+ static void decode8(IndexInput in, PostingDecodingUtil pdu, long[] tmp,
long[] longs)
Review Comment:
If I remember correctly, there was a version of the change where you always
read the longs through `PostingDecodingUtil` and didn't need `IndexInput` for
these edge-cases. It seems a little easier to understand that way (to me at
least). Since `PostingDecodingUtil` is always getting created with an
`IndexInput`, would it make sense to have a method in `PostingDecodingUtil`
that just does the read delegation? Or an alternative would be to expose
`IndexInput` (e.g., it could just be a public field in the abstract class)?
It's a minor detail but it seems like it would make the code wiring slightly
cleaner to not have to provide both `IndexInput` and `PostingDecodingUtil` when
initializing `ForUtil` / `ForDeltaUtil`.
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java:
##########
@@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws
IOException {
assert or != 0;
final int bitsPerValue = PackedInts.bitsRequired(or);
out.writeByte((byte) bitsPerValue);
- forUtil.encode(longs, bitsPerValue, out);
+ encodeDeltas(longs, bitsPerValue, out);
+ }
+ }
+
+ private void encodeDeltas(long[] longs, int bitsPerValue, DataOutput out)
throws IOException {
+ final int nextPrimitive;
+ final int numLongs;
+ if (bitsPerValue <= 4) {
+ nextPrimitive = 8;
+ numLongs = BLOCK_SIZE / 8;
+ collapse8(longs);
+ } else if (bitsPerValue <= 11) {
+ nextPrimitive = 16;
+ numLongs = BLOCK_SIZE / 4;
+ collapse16(longs);
+ } else {
+ nextPrimitive = 32;
+ numLongs = BLOCK_SIZE / 2;
+ collapse32(longs);
+ }
+
+ final int numLongsPerShift = bitsPerValue * 2;
+ int idx = 0;
+ int shift = nextPrimitive - bitsPerValue;
+ for (int i = 0; i < numLongsPerShift; ++i) {
+ tmp[i] = longs[idx++] << shift;
+ }
+ for (shift = shift - bitsPerValue; shift >= 0; shift -= bitsPerValue) {
+ for (int i = 0; i < numLongsPerShift; ++i) {
+ tmp[i] |= longs[idx++] << shift;
+ }
+ }
+
+ final int remainingBitsPerLong = shift + bitsPerValue;
+ final long maskRemainingBitsPerLong;
+ if (nextPrimitive == 8) {
+ maskRemainingBitsPerLong = MASKS8[remainingBitsPerLong];
+ } else if (nextPrimitive == 16) {
+ maskRemainingBitsPerLong = MASKS16[remainingBitsPerLong];
+ } else {
+ maskRemainingBitsPerLong = MASKS32[remainingBitsPerLong];
+ }
+
+ int tmpIdx = 0;
+ int remainingBitsPerValue = bitsPerValue;
+ while (idx < numLongs) {
+ if (remainingBitsPerValue >= remainingBitsPerLong) {
+ remainingBitsPerValue -= remainingBitsPerLong;
+ tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) &
maskRemainingBitsPerLong;
+ if (remainingBitsPerValue == 0) {
+ idx++;
+ remainingBitsPerValue = bitsPerValue;
+ }
+ } else {
+ final long mask1, mask2;
+ if (nextPrimitive == 8) {
+ mask1 = MASKS8[remainingBitsPerValue];
+ mask2 = MASKS8[remainingBitsPerLong - remainingBitsPerValue];
+ } else if (nextPrimitive == 16) {
+ mask1 = MASKS16[remainingBitsPerValue];
+ mask2 = MASKS16[remainingBitsPerLong - remainingBitsPerValue];
+ } else {
+ mask1 = MASKS32[remainingBitsPerValue];
+ mask2 = MASKS32[remainingBitsPerLong - remainingBitsPerValue];
+ }
+ tmp[tmpIdx] |= (longs[idx++] & mask1) << (remainingBitsPerLong -
remainingBitsPerValue);
+ remainingBitsPerValue = bitsPerValue - remainingBitsPerLong +
remainingBitsPerValue;
+ tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) & mask2;
+ }
+ }
+
+ for (int i = 0; i < numLongsPerShift; ++i) {
+ out.writeLong(tmp[i]);
+ }
+ }
+
+ /** Number of bytes required to encode 128 integers of {@code bitsPerValue}
bits per value. */
+ int numBytes(int bitsPerValue) {
Review Comment:
I think this is also identical to what's in `ForUtil`? Could we reference it
and avoid the duplication?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/PostingIndexInput.java:
##########
@@ -50,6 +53,6 @@ public void decode(int bitsPerValue, long[] longs) throws
IOException {
* and store results into {@code longs}.
*/
public void decodeAndPrefixSum(int bitsPerValue, long base, long[] longs)
throws IOException {
Review Comment:
Actually, I guess this whole class is now only used by benchmarks? Do we
actually need it or could benchmarks go directly against `ForUtil` /
`ForDeltaUtil`?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/PostingIndexInput.java:
##########
@@ -50,6 +53,6 @@ public void decode(int bitsPerValue, long[] longs) throws
IOException {
* and store results into {@code longs}.
*/
public void decodeAndPrefixSum(int bitsPerValue, long base, long[] longs)
throws IOException {
Review Comment:
It looks like this method is only around for benchmarking? Could benchmarks
directly call `ForDeltaUtil#decodeAndPrefixSum` so we could not need this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]