Re: Scan column metrics

2023-10-09 Thread Péter Váry
The owner of the table wanted to keep the column stats for all of the columns, claiming that other users might/are using the statistics of the columns. Even if I am not sure that their case was defendable, I think the reader of the table is often not in the position to optimize the table for their

Re: Migration of PyIceberg to iceberg-python repository

2023-10-09 Thread Jean-Baptiste Onofré
+1 for removing to avoid misunderstanding :). It's cleaner/clearer now with iceberg-python repo. Thanks Fokko & Ed ! Regards JB On Sun, Oct 8, 2023 at 9:07 PM Fokko Driesprong wrote: > > Hey everyone, > > It has been a week since PyIceberg migrated to its own repository. Should we > move forwar

Re: Scan column metrics

2023-10-09 Thread Ryan Blue
For that use case, it sounds like you'd be much better off not storing all the stats rather that skipping them at read time. I understand the user wants to keep them, but it may still not be a great choice. I'm just worried that this is going to be a lot of effort for you that doesn't really genera

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-09 Thread Russell Spitzer
The main things I’m still interested are alternative approaches. I think that some of the work that Anton is working on have shown some different bottlenecks in applying delete files that I’m not sure are addressed by this proposal.For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup t