leerho commented on code in PR #734:
URL: https://github.com/apache/datasketches-java/pull/734#discussion_r3243523804


##########
src/main/java/org/apache/datasketches/req/ReqSerDe.java:
##########
@@ -204,11 +203,17 @@ static final Compactor extractCompactor(final 
PositionalSegment posSeg, final bo
     final int count = posSeg.getInt();
     final float[] arr = new float[count];
     posSeg.getFloatArray(arr, 0, count);
-    float minItem = Float.MAX_VALUE;
-    float maxItem = Float.MIN_VALUE;
+    float minItem = Float.NaN;
+    float maxItem = Float.NaN;
     for (int i = 0; i < count; i++) {
-      minItem = min(minItem, arr[i]);
-      maxItem = max(maxItem, arr[i]);
+      final float item = arr[i];
+      if (Float.isNaN(minItem)) {
+        minItem = item;
+        maxItem = item;
+      } else {
+        if (item < minItem) { minItem = item; }
+        if (item > maxItem) { maxItem = item; }
+      }

Review Comment:
   I have taken a closer look at this, and what you have here is not correct 
either.  If a NaN should happen to appear, it would reset both the minItem and 
maxItem values to NaN, erasing the tracked minItem and maxItem values.
   
   This extractCompactor is part of the deserialization process, and since NaNs 
are rejected during sketch construction, they should not appear in the 
serialization data.  Nonetheless, this section of code is just trying to 
reconstruct the min and max values of this specific compactor. 
   
   These specific min and max values here are only used in a sketch that is in 
EXACT format, which has a single compactor.  The min and max values are not 
stored in the serialization structure itself and must be derived dynamically 
from the compactor.  This keeps the overall storage small for small data 
streams.
   
   Note that in the normal, ESTIMATION format the min and max values are stored 
in the serialization structure itself, so the min and max values being computed 
here are not used. (This means the error you observed where it looses negative 
values probably occurred with a very small data set.)
   
   I think we can trust that the values being examined from the serialized data 
structure do not contain NaNs.   This means that to correct the original error, 
all we need to do is correct the initialization of the min and max values to 
+/- infinity.  So the corrected section should look like the following, which 
will preserve any infinities if they exist:
   
   ```
       posSeg.getFloatArray(arr, 0, count);
       float minItem = Float.POSITIVE_INFINITY;   // changed line
       float maxItem = Float.NEGATIVE_INFINITY; // changed line 
       for (int i = 0; i < count; i++) {
         minItem = min(minItem, arr[i]);
         maxItem = max(maxItem, arr[i]);
       }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to