mkaravel commented on PR #2971: URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2780822975
> I think your strategy is a good one for JTS, but I also think it's OK to do anything that won't result in accidentally excluding the entire row roup (i.e., a writer MAY choose to either include or exclude finite coordinates from geometries that contain nan values when writing statistics, or non-points that contain NaN values have undefined behaviour but shouldn't affect valid geometries in the same row group). If you include coordinates values for geometries that contain unexpected/invalid NaN coordinates the bounding boxes can only get bigger. Although it would depend on the engine, I would expect such a situation to not affect query results for valid geometries. In general, if geometries with invalid coordinates are in the data the behavior should really be considered undefined from the query engine's perspective, and to be honest whatever this implementation does is okay as long as: * It does not expose these NaN values in the output bounding box at the storage level. * It does not skip valid geometries in the same group (which was one of your comments which I totally agree with). I think what I propose is a simple modification of the existing implementation and satisfies these requirements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
