mkaravel commented on PR #2971: URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2814278248
> Then I have some questions: > > What is an invalid value in a geometry feature? NaN? +/-Inf? Anything else? > From the above discussion, it seems that +/-Inf are invalid values in terms of a bbox. If that's true, definitely we should not make it as the final bbox to persist in the file. Is NaN a valid value in a bbox? Is it a good way to use NaN values for an empty bbox? > Is it a good approach to drop the entire bbox if any NaN or invalid value appears? In this way, we do not fail the writer at the cost of missing bbox. I'm in favor of this so we do not produce any confusing stats to users. It is really hard to downstream users to decide if the provided stats are reliable for predicate push down. @wgtmac Just wanted to share my view/opinion on this: * In a geometry feature NaN or +/-inf values do not make sense. +/-inf values could make sense in a geometric (not geographic) bounding box, but this would be a convention. * Given the above statement, +/-inf values in a bounding box could potentially be persisted but this would be corrupt or invalid data. The engine/reader could choose how to handle them. * I think using NaN values for representing empty boxes makes a lot of sense and it provides information compared to dropping a box of NaN values (by "drop" I mean write nothing instead). Specifically, if I see no bounding box I am basically forced to believe that I know nothing about my data. If I see an empty box than I know I can safely skip this piece of data for certain operations (like spatial predicates). Hope this makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
