leerho commented on code in PR #734:
URL: https://github.com/apache/datasketches-java/pull/734#discussion_r3243523804
##########
src/main/java/org/apache/datasketches/req/ReqSerDe.java:
##########
@@ -204,11 +203,17 @@ static final Compactor extractCompactor(final
PositionalSegment posSeg, final bo
final int count = posSeg.getInt();
final float[] arr = new float[count];
posSeg.getFloatArray(arr, 0, count);
- float minItem = Float.MAX_VALUE;
- float maxItem = Float.MIN_VALUE;
+ float minItem = Float.NaN;
+ float maxItem = Float.NaN;
for (int i = 0; i < count; i++) {
- minItem = min(minItem, arr[i]);
- maxItem = max(maxItem, arr[i]);
+ final float item = arr[i];
+ if (Float.isNaN(minItem)) {
+ minItem = item;
+ maxItem = item;
+ } else {
+ if (item < minItem) { minItem = item; }
+ if (item > maxItem) { maxItem = item; }
+ }
Review Comment:
I have taken a closer look at this, and what you have here is not correct
either. If a NaN should happen to appear, it would reset both the minItem and
maxItem values to NaN, erasing the tracked minItem and maxItem values.
This extractCompactor is part of the deserialization process, and since NaNs
are rejected during sketch construction, they should not appear in the
serialization data. Nonetheless, this section of code is just trying to
reconstruct the min and max values of this specific compactor.
These specific min and max values here are only used in a sketch that is in
EXACT format, which has a single compactor. The min and max values are not
stored in the serialization structure itself and must be derived dynamically
from the compactor. This keeps the overall storage small for small data
streams.
Note that in the normal, ESTIMATION format the min and max values are stored
in the serialization structure itself, so the min and max values being computed
here are not used. (This means the error you observed where it looses negative
values probably occurred with a very small data set.)
I think we can trust that the values being examined from the serialized data
structure do not contain NaNs. This means that to correct the original error,
all we need to do is correct the initialization of the min and max values to
+/- infinity. So the corrected section should look like the following, which
will preserve any infinities if they exist:
```
posSeg.getFloatArray(arr, 0, count);
float minItem = Float.POSITIVE_INFINITY; // changed line
float maxItem = Float.NEGATIVE_INFINITY; // changed line
for (int i = 0; i < count; i++) {
minItem = min(minItem, arr[i]);
maxItem = max(maxItem, arr[i]);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]