Re: [DISCUSS] Variant Spec Location

2024-08-21 Thread Gang Wu
It seems that we have reached a consensus to some extent that there should be a new home for the variant spec. The pending question is whether Parquet or Arrow is a better choice. As a committer from Arrow, Parquet and ORC communities, I am neutral to choose any and happy to help with the movement

Re: [VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-21 Thread Piotr Findeisen
Hi Eduard, JB wrote For the record (maybe it helps users/reviewers), this release includes: > - ORC 1.9.4 update > - introduce memory limit on ParallelIterable I can confirm ParallelIterable change, but i am not sure whether ORC update was part of the release. Best Piotr On Wed, 21 Aug 20

[DISCUSS] Improving Position Deletes in V3

2024-08-21 Thread Anton Okolnychyi
Hey folks, As discussed during the sync, I've been working on a proposal to improve the handling of position deletes in V3. It builds on lessons learned from deploying the current approach at scale and addresses all unresolved questions from past community discussions and proposals around this top

Re: clarification on changelog behavior for equality deletes

2024-08-21 Thread Steven Wu
Agree with everyone that option (a) is the correct behavior. On Wed, Aug 21, 2024 at 11:57 AM Steve Zhang wrote: > I agree that option (a) is what user expects for row level changes. > > I feel the added deletes in given snapshots provides a PK of DELETED > entry, existing deletes are used to re

Re: clarification on changelog behavior for equality deletes

2024-08-21 Thread Steve Zhang
I agree that option (a) is what user expects for row level changes. I feel the added deletes in given snapshots provides a PK of DELETED entry, existing deletes are used to read together with data files to find DELETED value (V1b) and result of columns. Thanks, Steve Zhang > On Aug 20, 2024

Re: [VOTE] REST Endpoint discovery

2024-08-21 Thread Anurag Mantripragada
+1 (non-binding) Anurag Mantripragada > On Aug 21, 2024, at 3:11 AM, Sung Yun wrote: > > +1 (non-binding) > > Thank you Eduard! This is a great feature enhancement to the catalog. Thumbs > up! > > On Tue, Aug 20, 2024 at 5:27 PM Amogh Jahagirdar <2am...@gmail.com >

Re: clarification on changelog behavior for equality deletes

2024-08-21 Thread Shani Elharrar
+1 for option (a). Shani.On 21 Aug 2024, at 19:07, Péter Váry wrote:I think from the correctness perspective only the option (a) is valid. The difference between snapshot2 and snapshot3 is one delete and one insertion.Jason Fine ezt írta (időpont: 2024. aug. 21., Sze, 15:26):Great to see someone

Re: clarification on changelog behavior for equality deletes

2024-08-21 Thread Péter Váry
I think from the correctness perspective only the option (a) is valid. The difference between snapshot2 and snapshot3 is one delete and one insertion. Jason Fine ezt írta (időpont: 2024. aug. 21., Sze, 15:26): > Great to see someone is working on this feature! > > IMHO option (a) is preferred. M

Re: Table schema and partition spec update

2024-08-21 Thread Péter Váry
I hope to do even better. If the stream could provide information about the spec/specId for the record which we would like to write, then we could refresh and use a new writer immediately. Teaser - if the user could provide a converter which converts the input data to `DynamicData` then we can cre

Re: clarification on changelog behavior for equality deletes

2024-08-21 Thread Jason Fine
Great to see someone is working on this feature! IMHO option (a) is preferred. My (impulsive) reasoning for this is for the following reasons: 1. I think in CDC you shouldn't be skipping snapshots so you would get the deleted event while processing snapshot 2 anyway. 2. If you consider d

Re: [DISCUSS] Release source and binary verification

2024-08-21 Thread Jean-Baptiste Onofré
Hi Justin, Thanks for clarifying, I was not clear in my previous email (I tried to mean even gradle wrapper is *not* OK). We don't package any binary in the Iceberg source distribution though. Regards JB On Tue, Aug 20, 2024 at 11:55 PM Justin Mclean wrote: > > Hi, > > > I would add an additio

Re: [DISCUSS] Release source and binary verification

2024-08-21 Thread Justin Mclean
Hi, > Just to clarify that we're currently not adding the gradle wrapper in the > source distribution. We have some custom code > > for that exact reason that downloads the gradle wrapper if it's m

Re: [VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-21 Thread Fokko Driesprong
Hey Eduard, I think it relates to this PR. It contains a CVE and would be good to be backported. We wanted to include it in 1.6.1 if we needed another RC, but that didn't happen, so I think we didn't cherry-pick it to 1.6.x branch. Kind regards, Fokk

Re: [DISCUSS] Release source and binary verification

2024-08-21 Thread Eduard Tudenhöfner
> > It’s not OK to include the gradle wrapper in the source release. A source > release can't include any jars with compiled code in them. Just to clarify that we're currently not adding the gradle wrapper in the source distribution. We have some custom code

Re: [ANNOUNCE] Release Apache Iceberg Rust v0.3.0

2024-08-21 Thread Xuanwo
Thank you for pointing that out. I will update the announcement template. On Wed, Aug 21, 2024, at 15:32, Maxim Solodovnik wrote: > Hello, > > I believe there is a typo in announce: > > On Wed, 21 Aug 2024 at 14:30, Xuanwo wrote: >> >> Hi all, >> >> The Apache Iceberg Rust community is pleased to

Re: [ANNOUNCE] Release Apache Iceberg Rust v0.3.0

2024-08-21 Thread Maxim Solodovnik
Hello, I believe there is a typo in announce: On Wed, 21 Aug 2024 at 14:30, Xuanwo wrote: > > Hi all, > > The Apache Iceberg Rust community is pleased to announce > that Apache Iceberg Rust v0.3.0 has been released! > > Iceberg is an open table format for analytic datasets, and > Iceberg Rust is

Re: [VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-21 Thread Eduard Tudenhöfner
@Piotr can you please elaborate which ORC update you are referring to? Or did you mean the Avro update (which I think we were planning for 1.6.2)? On Tue, Aug 20, 2024 at 7:05 PM Piotr Findeisen wrote: > Hi > > -1 (non-binding) > > I verified source tarball matches the git tag (except it > lacks

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-21 Thread Walaa Eldin Moustafa
Hi Jan, I do not think this is feasible because it assumes the catalog identifiers do not collide across catalogs. Anyways, let us not over engineer this use case. As I mentioned, it was for illustration purposes. Since the discussion moved from “UUIDs vs Sequence numbers” to “UUIDs vs catalog tab