Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-09 Thread Vignesh
Hi, I am reading about iceberg and am quite new to this. This puffin would be an index from key to data file. Other use cases of Puffin, such as statistics are at a per file level if I understand correctly. Where would the puffin about key->data file be stored? It is a property of the entire table

Re: [VOTE] Release Apache PyIceberg 0.8.0rc1

2024-11-09 Thread Fokko Driesprong
+1 (binding) Thanks for running this release Kevin! - Verified signatures and checksum - Checked for licenses - Installed and ran tests - Did some local testing Kind regards, Fokko Op za 9 nov 2024 om 00:01 schreef Drew : > +1 (non-binding) > > - verified signature and checksum > - verified RA

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-09 Thread Shani Elharrar
JB, this is what we do, we write Equality Deletes and periodically convert them to Positional Deletes. We could probably index the keys, maybe partially index using bloom filters, the best would be to put those bloom filters inside puffin. Shani.On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré wrote:

Re: [ANNOUNCE] Apache Iceberg release 1.7.0

2024-11-09 Thread Jean-Baptiste Onofré
Great ! Thanks Russell for driving this release ! Regards JB On Fri, Nov 8, 2024 at 4:33 PM Russell Spitzer wrote: > > I'm pleased to announce the release of Apache Iceberg 1.7.0! > > Apache Iceberg is an open table format for huge analytic datasets. Iceberg > delivers high query performance fo

Re: [DISCUSS] Add a implementation status page for iceberg

2024-11-09 Thread Jean-Baptiste Onofré
Hi, I like the idea. My only comment is probably to use versions instead of check marks, but all good :) Thanks ! Regards JB On Fri, Nov 8, 2024 at 3:33 PM Russell Spitzer wrote: > > Sounds like a great idea to me > > On Fri, Nov 8, 2024 at 7:58 AM Renjie Liu wrote: >> >> Hi: >> >> As iceberg

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-09 Thread Jean-Baptiste Onofré
Hi, I agree with Peter here, and I would say that it would be an issue for multi-engine support. I think, as I already mentioned with others, we should explore an alternative. As the main issue is the datafile scan in streaming context, maybe we could find a way to "index"/correlate for positiona