Re: [VOTE] Release Apache Iceberg Rust 0.5.1 RC1

2025-05-26 Thread Renjie Liu
+1 (binding) [x] Download links are valid. [x] Checksums and signatures. [x] LICENSE/NOTICE files exist [x] No unexpected binary files [x] Ran `make check` to verify linter, format [x] Ran `maek test` to verify integration tests Thanks Kevin for driving this! On Tue, May 27, 2025 at 1:16 PM Kevi

Re: [VOTE] Release Apache Iceberg Rust 0.5.1 RC1

2025-05-26 Thread Kevin Liu
+1 (non-binding) [x] Download links are valid. [x] Checksums and signatures. [x] LICENSE/NOTICE files exist [x] No unexpected binary files [x] All source files have ASF headers [x] Can compile from source Ran `./scripts/verify.py` Built and tested, `make build`, `make test` Tested pyiceberg-core

[VOTE] Release Apache Iceberg Rust 0.5.1 RC1

2025-05-26 Thread Kevin Liu
Hello Apache Iceberg Rust Community, This is a call for a vote to release Apache Iceberg rust version 0.5.1 The tag to be voted on is v0.5.1-rc.1 The release candidate: https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-rust-0.5.1-rc.1/ Keys to verify the release candidate: https://do

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Kevin Liu
Hi everyone, Unfortunately, there was an issue during the publishing process for 0.5.0. This was caused by a circular dependency. See the github thread here for more details, https://github.com/apache/iceberg-rust/issues/1325#issuecomment-2910782152 To fix this, I have raised https://github.com/ap

Re: [VOTE] Release Apache Iceberg 1.9.1 RC1

2025-05-26 Thread Yuya Ebihara
+1 (non-binding) Confirmed that Trino and Starburst CI are green. It runs tests against several catalogs, including HMS, Glue, JDBC (PostgreSQL), REST (Polaris, Unity, S3 Tables, Tabular), Nessie, and Snowflake. Yuya On Sat, May 24, 2025 at 7:37 AM Steven Wu wrote: > +1 (binding) > > Checked s

Re: Wide tables in V4

2025-05-26 Thread Gang Wu
I agree with Steven that there are limitations that Parquet cannot do. In addition to adding new columns by rewriting all files, files of wide tables may suffer from bad performance like below: - Poor compression of row groups because there are too many columns and even a small number of rows can

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Kevin Liu
Hi everyone, Thank you for testing, verifying, and voting on 0.5.0 RC2. The 72-hour period has passed and we have the necessary number of binding votes to accept the release candidate as Apache Iceberg Rust v0.5.0. The vote PASSED with 4 +1 binding votes and 5 +1 non-binding votes, no +0 or -1 vo

Re: Wide tables in V4

2025-05-26 Thread Steven Wu
The Parquet metadata proposal (linked by Fokko) is mainly addressing the read performance due to bloated metadata. What Peter described in the description seems useful for some ML workload of feature engineering. A new set of features/columns are added to the table. Currently, Iceberg would requi

Re: Wide tables in V4

2025-05-26 Thread Péter Váry
Do you have the link at hand for the thread where this was discussed on the Parquet list? The docs seem quite old, and the PR stale, so I would like to understand the situation better. If it is possible to do this in Parquet, that would be great, but Avro, ORC would still suffer. Amogh Jahagirdar

Re: Wide tables in V4

2025-05-26 Thread Amogh Jahagirdar
Hey Peter, Thanks for bringing this issue up. I think I agree with Fokko; the issue of wide tables leading to Parquet metadata bloat and poor Thrift deserialization performance is a long standing issue that I believe there's motivation in the community to address. So to me it seems better to addre

Re: Wide tables in V4

2025-05-26 Thread Fokko Driesprong
Hi Peter, Thanks for bringing this up. Wouldn't it make more sense to fix this in Parquet itself? It has been a long-running issue on Parquet, and there is still active interest from the community. There is a PR to replace the footer with FlatBuffers, which dramatically improves performance

Re: Wide tables in V4

2025-05-26 Thread Pucheng Yang
Hi Peter, I am interested in this proposal. What's more, I am curious if there is a similar story on the write side as well (how to generate these splitted files) and specifically, are you targeting feature backfill use cases in ML use? On Mon, May 26, 2025 at 6:29 AM Péter Váry wrote: > Hi Team

RE: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Jonathan Chen
+ 1 non-binding Thank you for this! On 2025/05/23 04:54:22 Kevin Liu wrote: > Hello Apache Iceberg Rust Community, > > This is a call for a vote to release Apache Iceberg rust version 0.5.0. > The tag to be voted on is v0.5.0-rc.2. > > The release candidate: > https://dist.apache.org/repos/dis

Re: Wide tables in V4

2025-05-26 Thread yun zou
+1, I am really interested in this topic. Performance has always been a problem when dealing with wide tables, not just read/write, but also during compilation. Most of the ML use cases typically exhibit a vectorized read/write pattern, I am also wondering if there is any way at the metadata level

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Fokko Driesprong
+1 (binding) Thanks for fixing the licenses, Kevin! Kind regards, Fokko Op ma 26 mei 2025 om 20:30 schreef Jonathan Chen : > + 1 non-binding > > Thank you for this! > > On 2025/05/23 04:54:22 Kevin Liu wrote: > > Hello Apache Iceberg Rust Community, > > > > This is a call for a vote to release

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Amogh Jahagirdar
+1 (binding) Verified checksums/signatures/license and ran build/tests Thanks, Amogh Jahagirdar On Mon, May 26, 2025 at 4:44 AM Renjie Liu wrote: > Yes, I've added an issue to track this: > https://github.com/apache/iceberg-rust/issues/1378 > > > On Mon, May 26, 2025 at 6:12 PM Eduard Tudenhöf

Wide tables in V4

2025-05-26 Thread Péter Váry
Hi Team, In machine learning use-cases, it's common to encounter tables with a very high number of columns - sometimes even in the range of several thousand. I've seen cases with up to 15,000 columns. Storing such wide tables in a single Parquet file is often suboptimal, as Parquet can become a bo

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Renjie Liu
Yes, I've added an issue to track this: https://github.com/apache/iceberg-rust/issues/1378 On Mon, May 26, 2025 at 6:12 PM Eduard Tudenhöfner wrote: > +1 (binding) > > I think it would be beneficial if the Rust Release docs would have > something similar to > https://iceberg.apache.org/how-to-r

Re: [VOTE][Go] Release Apache Iceberg Go v0.3.0 RC0

2025-05-26 Thread Eduard Tudenhöfner
+1 (binding) Thanks everyone! On Thu, May 22, 2025 at 12:19 AM Leon Lin wrote: > +1 (non-binding) > > Thank you Matt for running the release! > > On Wed, May 21, 2025 at 10:32 AM Kevin Liu wrote: > >> +1 (non-binding) >> >> [x] Download links are valid. >> [x] Checksums and signatures. >> [x]

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Eduard Tudenhöfner
+1 (binding) I think it would be beneficial if the Rust Release docs would have something similar to https://iceberg.apache.org/how-to-release/#how-to-verify-a-release on how to verify an RC. Thanks everyone! On Mon, May 26, 2025 at 9:54 AM Renjie Liu wrote: > +1 binding > > [x] Download links

Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

2025-05-26 Thread Péter Váry
I'm interested, but can't be there, but please record the meeting. Thanks, Peter Maninderjit Singh ezt írta (időpont: 2025. máj. 24., Szo, 2:30): > Hi dev community, > I was wondering if we could join a call next week for discussing the > multi-table transactions so we can make progress. I have

Re: [VOTE] Release Apache Iceberg Rust 0.5.0 RC2

2025-05-26 Thread Renjie Liu
+1 binding [x] Download links are valid. [x] Checksums and signatures. [x] No unexpected binary files [x] Ran `make check` to check formats [x] Ran `make test` to run tests Thanks Kevin! On Sun, May 25, 2025 at 11:58 PM NOTME ZE wrote: > +1 (non-binding) > > [x] Download links are valid. > [x]