Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-06 Thread Denys Kuzmenko
Hi All, After reviewing Iceberg's proposals for stats, checking the code and reading the comments, I've created a DRAFT proposal for the partition-level column stats. Would be great to continue discussion on that topic and share the ideas. https://docs.google.com/document/d/11Rp-irqb4L4Qpdxr6l8

Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread Denys Kuzmenko
Thanks All for the reactions. I wanted to emphasize that Hive's StatsObject was shared as an example with the suggestion to adapt it for iceberg - `PartitionColumnStats` (i.e. use column ids and drop name/type, etc). As was mentioned by Rayan, column upper/lower bounds, counts, null value and

Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread Denys Kuzmenko
Hi Gabor, Thanks for your feedback! > In that use case however, we'd lose the stats we got previously from HMS For Iceberg tables Hive computes and stores the same stats object in a puffin file, previously persisted to HMS. So, there shouldn't be any changes for Impala other than changing the

Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread Denys Kuzmenko
There is an option to standardize Hive's ColStatistics object schema and use Iceberg: class ColStatistics { static class Range { Number minValue; Number maxValue; } String colName; String colType; long countDistinct; long numNulls; double avgColLen; long numTrues; lo

Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread Denys Kuzmenko
sorry, valid Doc PR link: https://github.com/apache/iceberg-docs/pull/269

[DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread Denys Kuzmenko
Hi Everyone, We'd like to discuss an extension to the supported blob types in puffin spec. Hive-4 uses statistics auto-generation to optimize Iceberg query performance. Column statistics are written to puffin files per snapshot. The statistics calculated by Hive include histograms, NDV (Number o

Re: [DISCUSS] Hive Support

2025-01-07 Thread Denys Kuzmenko
Hi Peter, Re "Hive would provide a HMS client jar which only contains java code which is needed to connect and communicate using Thrift with a HMS instance (no internal HMS server code etc). We could use it as a dependency for our iceberg-hive-metastore module. Either setting a minimal version

Re: Building with JDK 21

2024-07-18 Thread Denys Kuzmenko
In the following 1-2 months we plan to release HIVE-4.0.1 which includes bug fixes and then focus on HIVE-4.1.0 release with jdk17.

Re: Building with JDK 21

2024-07-18 Thread Denys Kuzmenko
Hi All, Let me chime in here and add some Hive perspective on that. The only reason Iceberg support has moved into Hive, is that we didn't get enough support from the existing community. Our PRs got stuck pending review and even if we got some +1 those were not binding. Don't take me wrong, it