paleolimbot commented on PR #240: URL: https://github.com/apache/parquet-format/pull/240#issuecomment-2546234738
> separate from the spec it might be good to start discussions on what the implementation for GeoParquet might look like (e.g. what new dependencies do we plan on taking on for reference implementation? What would APIs look like?) @emkornfield I think this is a good idea! The PoC implementations specifically may not handle writing statistics for non-planar edges depending on the final call on whether the statistics are always Cartesian min/max (i.e., lying for spherical edges and should be ignored), or whether the statistics take into curved edges for the non-planar case (requires non-trivial computational effort and complexity on behalf of the writer, but eliminates computational effort and complexity for the reader). Discussions in Iceberg have converged on the latter, which means we may have to figure out how to plug in S2 and/or Boost::Geometry when writing statistics in C++ (I can't speak for Java). Off the top of my head it could either be a Parquet-specific hook to override stats for a column chunk, the name of an Arrow compute UDF that can compute the required box, or willingness to put Boost or s2 as a dependency in that section of the code. (I don't think that's required for PoC , personally, but I'm also happy to prototype any of those if somebody does). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org