Re: Changelog scan for table with delete files

2025-02-14 Thread Wing Yew Poon
Ok Anton. Please let me know. On Thu, Feb 13, 2025 at 9:28 PM Anton Okolnychyi wrote: > Hey Wing Yew, I am planning to focus on this after we get partition stats > readers/writers into main. I actually have ideas on how to implement > changelog scans for V2 tables efficiently. > > - Anton > > п

Re: [VOTE] Release Apache Iceberg 1.8.0 RC0

2025-02-14 Thread Anurag Mantripragada
I saw similar docker issues on Azure tests as well. I moved some tests to integration source sets in PR: https://github.com/apache/iceberg/pull/12274 Please take a look Thanks, Anurag Mantripragada > On Feb 13, 2025, at 1:36 AM, Péter Váry wrote: > > A late +1 - I just got to checking the s

Re: pre-proposal: schema_id on DataFile

2025-02-14 Thread rdb...@gmail.com
We've considered this in the past and I'm undecided on it. There is some benefit, like being able to prune files during planning if the file didn't contain a column that is used in a non-null filter (i.e. `new_data_column IN ("a", "b")`). On the other hand, we don't want data files that were writt

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-14 Thread Russell Spitzer
That sounds good to me , "Spec" Or "Specification" On Fri, Feb 14, 2025 at 1:41 PM Steven Wu wrote: > It makes sense to separate out the technical parts (specs) from the > current "Project" group. > > For the new "technical" content (catalog, term, specs), "Concepts" is > probably not accurate.

Re: pre-proposal: schema_id on DataFile

2025-02-14 Thread Devin Smith
Thanks for the info, it is very helpful. I see it debugging down through `org.apache.iceberg.ManifestReader#readMetadata`. It wasn't obvious to me that this sort of data would be in the avro metadata as opposed to the org.apache.iceberg.ManifestFile object. I may have some questions later about the

Re: [DISCUSS] Consolidate docs under Concepts and Project/Terms

2025-02-14 Thread Steven Wu
It makes sense to separate out the technical parts (specs) from the current "Project" group. For the new "technical" content (catalog, term, specs), "Concepts" is probably not accurate. Maybe "Spec" is more appropriate. On Thu, Feb 13, 2025 at 9:41 AM Russell Spitzer wrote: > I think we should

Re: pre-proposal: schema_id on DataFile

2025-02-14 Thread Fokko Driesprong
Hi Devin, The schema-id is stored in the Manifest Avro header: https://iceberg.apache.org/spec/#manifests Also the schema itself is stored there. Would that help your situation? I think this makes adding it to the data file redundant. Kind regards, Fokko Op vr 14 feb 2025 om 17:56 schreef Devin

pre-proposal: schema_id on DataFile

2025-02-14 Thread Devin Smith
I want to make sure I'm not missing something that already exists; otherwise, hoping to get a quick thumbs up / thumbs down on a potential proposal before spending more time on it. It would be nice to know what Iceberg schema a writer used (/assumed) when writing a DataFile. Oftentimes, this infor

Re: [VOTE] Add overwriteRequested to RegisterTableRequest in REST spec

2025-02-14 Thread Eduard Tudenhöfner
+1 On Fri, Feb 14, 2025 at 12:57 AM Szehon Ho wrote: > +1 > > Thanks Steve! > Szehon > > On Thu, Feb 13, 2025 at 1:23 PM Yufei Gu wrote: > >> +1 (binding) >> Yufei >> >> >> On Thu, Feb 13, 2025 at 1:20 PM huaxin gao >> wrote: >> >>> +1 (non-binding) >>> >>> On Thu, Feb 13, 2025 at 11:51 AM Anu

Re: [DISCUSS] FileFormat API proposal

2025-02-14 Thread Péter Váry
Hi Renjie, Here is the WIP PR for the readers: https://github.com/apache/iceberg/pull/12069 Here is the WIP PR for the writers: https://github.com/apache/iceberg/pull/12164 If you want to concentrate on the proposed new API, maybe this is the best place to start: https://github.com/apache/iceberg/

Re: [DISCUSS] FileFormat API proposal

2025-02-14 Thread Renjie Liu
Hi, Peter: Thanks for raising this, and this proposal sounds quite interesting to me. I've reviewed the doc but it still seems too abstract to understand, do you mind to submit a pr so that it would be more clear what's changed? On Wed, Feb 12, 2025 at 12:46 AM Péter Váry wrote: > Hi Team, > >

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-14 Thread Gabor Kaszab
Thanks everyone for taking a look! The vote has passed with the following results: - +1 binding votes: 9 - +1 non-binding votes: 5 (including mine) - 0 votes: none - -1 votes: none Regards, Gabor On Thu, Feb 13, 2025 at 2:45 AM Renjie Liu wrote: > +1 > > On Wed, Feb 12, 2025 at 1: