paleolimbot commented on PR #2971: URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2778945139
> This is why I proposed to use NaNs for the case of entire column of empty geometries. This works for me, although I'd like a +1 from @wgtmac before changing the C++ implementation + test files! This is also consistent with what R and numpy will give you if you try to take the `max()` of an empty range. > In this case any NaN coordinates in the input will surface as NaNs in the resulting box. This is only true for JTS (GEOS just ignores NANs when computing a min/max for a dimension, lwgeom/PostGIS restarts interval computation after it sees an nan). I think your strategy is a good one for JTS, but I also think it's OK to do anything that won't result in accidentally excluding the entire row roup (i.e., a writer MAY choose to either include or exclude finite coordinates from geometries that contain nan values when writing statistics, or non-points that contain NaN values have undefined behaviour but shouldn't affect valid geometries in the same row group). > So for now I would assume that we do not want to fail in such cases, but rather compute the statistics in a safe manner. Yes, I think this is best for geometries with NaNs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
