mkaravel commented on PR #2971:
URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2764619708

   @zhangfengcdt Thank you for addressing the comments!
   
   Regarding the bounding boxes, I believe I did not make myself clear enough 
as to what I was suggesting. I think that using +/- infinity as initialization 
is a great idea and it helps with the code logic and can simplify it. If I 
understood the logic correctly, what was strange for me was exposing bounding 
boxes with +/- infinity values (which in the previous implementation could 
happen if you had a file with just empty geometries (and possibly NULLs). 
Roughly speaking this is code flow I had in mind when computing boxes:
   * To compute the bounding box for a row group initialize this box to `(inf, 
-inf, inf, -inf)`.
   * While looking at rows: If the geometry is empty skip it. If the geometry 
is not empty then merge its bounding box with the current bounding box for the 
group. This is where the initialization with infinite values is handy **because 
you do not need to check for infinity explicitly**. For example if the current 
bbox for the row group is `(inf, -inf, inf, -inf)` and the bbox of the current 
geometry is `(1, 2, 3, 4)`, you can simply do (I am writing conceptual code 
here) `rowGroup.xMin = Math.Min(rowGroup.xMin, geo.XMin)`, `rowGroup.xMax = 
Math.Max(rowGroup.xMax, geo.XMax)` and similarly for the other dimension(s). 
These statements (and this was my point) work if the initial coordinates of the 
row group bbox are +/- infinity (because this is how IEEE 754 floating point 
arithmetic works).
   * If at the end of processing a row group you still see a bbox of `(inf, 
-inf, inf, -inf)` then expose it as `(NaN, NaN, NaN, NaN)`. This means you have 
not found any non-empty geometry...
   
   I will go over the code again, probably tomorrow and make specific 
suggestions, if needed, regarding what I am referring to above.
   
   @paleolimbot Could you please share a link to the C++ implementation? Would 
be very interested in taking a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to