Re: Table schema and partition spec update

2024-08-19 Thread Péter Váry
Hi Fokko, Xianjin, Thanks for both proposals, I will take a deeper look soon! Both seems promising at the first glance. For the use cases, - I have seen requirements for converting incoming Avro records with evolving schema and writing them to a table. - I have seen requirements for creating new

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-19 Thread Benny Chow
Hi Walaa, I personally don't see a semantic issue with putting the table identifiers in the refresh state. The purpose of the refresh state is to basically take a snapshot of the table and view versions at the time of materialization. Directly using table identifiers seems pretty natural to me.

Re: [VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-19 Thread Renjie Liu
Hi, Carl: Thanks for driving this. I tried to run `./gradlew build` and one test failed: > Task :iceberg-core:test TestHadoopCommits > testConcurrentFastAppends(File) FAILED org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in org.apache.iceberg.hadoop.TestHadoo

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-19 Thread Walaa Eldin Moustafa
Hi Micah, it is mostly about the typical results of denormalization such as data consistency, management complexity, integrity, etc. However, as mentioned earlier, the main reason would be the semantic gap around using catalog table identifiers as a concept in the table (more specifically snapshot

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
Hi Xiangjin, Could you elaborate a bit more on how the Parquet manifest would fix the > type promotion problem? If the lower and upper bounds are still a map of > , I don't think we can perform column pruning on that, and the > type information of the stat column is still missing. I think the i

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
> > If we go with the approach that type promotion results in a change in the > field-id, what happens when a certain field has been changed > multiple times? Does it mean that we end up with tracking the lineage of > field change history? Yes, I was thinking it would be a recursive structure tha

[VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-19 Thread Carl Steinbach
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.6.1 release. The commit ID is e18a2fe10214f5f3ffa0a317a28af8b2a619817a * This corresponds to the tag: apache-iceberg-1.6.1-rc1 * https://github.com/apache/iceberg/commits/apache-iceberg-1.6.1-rc1 * https://gi

Re: Type promotion in v3

2024-08-19 Thread xianjin
Hey Ryan, Thanks for the reply, it clears most things up. Some responses inline: > This ends up being a little different because we can detect more cases when the bounds must have been strings — any time when the length of the upper and lower bound is different. Because strings tend to have longe

Re: Type promotion in v3

2024-08-19 Thread Gang Wu
Hi Micah, If we go with the approach that type promotion results in a change in the field-id, what happens when a certain field has been changed multiple times? Does it mean that we end up with tracking the lineage of field change history? Thanks, Gang On Tue, Aug 20, 2024 at 7:34 AM Micah Kornf

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread xianjin
+1 (non-binding)Sent from my iPhoneOn Aug 20, 2024, at 7:56 AM, Manu Zhang wrote:+1 (non-binding)Micah Kornfield 于2024年8月20日 周二07:44写道:+1 (non-binding)On Mon, Aug 19, 2024 at 4:33 PM Steve Zhang wrote:+1 (non-binding) Thanks,Steve Zhang On Aug 19, 2024, at 1:47 PM, John

Re: [DISCUSS] Adding RemovePartitionSpecsUpdate update type to REST

2024-08-19 Thread xianjin
+1 from my side as well.Sent from my iPhoneOn Aug 20, 2024, at 9:09 AM, Yufei Gu wrote:+1, the new spec looks good to me. It seems like the client-side handling the heavy lifting of figuring out which spec to remove is a reasonable approach. YufeiOn Mon, Aug 19, 2024 at 4:01 PM Anton Okolnychyi <

Re: [VOTE] Release Apache Iceberg 1.6.1 RC0

2024-08-19 Thread Carl Steinbach
Wow, that's embarrassing. Let me work on RC1. - Carl On Mon, Aug 19, 2024 at 6:11 PM Ajantha Bhat wrote: > -1 > > because the artifacts versions are incorrect. > It should be 1.6.1 instead of 0.6.1 > > - Ajantha > > On Tue, Aug 20, 2024 at 8:54 AM Carl Steinbach wrote: > >> Hi Everyone, >> >>

Re: [VOTE] Release Apache Iceberg 1.6.1 RC0

2024-08-19 Thread Ajantha Bhat
-1 because the artifacts versions are incorrect. It should be 1.6.1 instead of 0.6.1 - Ajantha On Tue, Aug 20, 2024 at 8:54 AM Carl Steinbach wrote: > Hi Everyone, > > I propose that we release the following RC as the official Apache Iceberg > 0.6.1 release. > > The commit ID is e18a2fe10214f5

Re: [DISCUSS] Adding RemovePartitionSpecsUpdate update type to REST

2024-08-19 Thread Yufei Gu
+1, the new spec looks good to me. It seems like the client-side handling the heavy lifting of figuring out which spec to remove is a reasonable approach. Yufei On Mon, Aug 19, 2024 at 4:01 PM Anton Okolnychyi wrote: > Seems reasonable to me. > > - Anton > > пн, 19 серп. 2024 р. о 15:19 Amogh

[VOTE] Release Apache Iceberg 1.6.1 RC0

2024-08-19 Thread Carl Steinbach
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 0.6.1 release. The commit ID is e18a2fe10214f5f3ffa0a317a28af8b2a619817a * This corresponds to the tag: apache-iceberg-0.6.1-rc0 * https://github.com/apache/iceberg/commits/apache-iceberg-0.6.1-rc0 * https://gi

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Manu Zhang
+1 (non-binding) Micah Kornfield 于2024年8月20日 周二07:44写道: > +1 (non-binding) > > On Mon, Aug 19, 2024 at 4:33 PM Steve Zhang > wrote: > >> +1 (non-binding) >> >> Thanks, >> Steve Zhang >> >> >> >> On Aug 19, 2024, at 1:47 PM, John Zhuge wrote: >> >> +1 (non-binding) >> >> On Mon, Aug 19, 2024 at

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-19 Thread Micah Kornfield
> > Thanks Micah, for the latter, I meant the type of denormalization of > repeating a 3-part name as opposed to using an ID. Is the concern here just metadata size or something else? For size I think if this is really anticipated to be a problem that it is likely for the state map in general, a

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Micah Kornfield
+1 (non-binding) On Mon, Aug 19, 2024 at 4:33 PM Steve Zhang wrote: > +1 (non-binding) > > Thanks, > Steve Zhang > > > > On Aug 19, 2024, at 1:47 PM, John Zhuge wrote: > > +1 (non-binding) > > On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu wrote: > >> +1 >> Yufei >> >> >> On Mon, Aug 19, 2024 at 1:1

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
Hi Ryan, Thanks for the reply, responses inline > >- How do we keep track of the replaced column? Does it remain in the >schema? Either we would need to keep the old schemas or implement a new >“hidden” column state > > I don't think this is the case, the function metadata provides al

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Steve Zhang
+1 (non-binding) Thanks, Steve Zhang > On Aug 19, 2024, at 1:47 PM, John Zhuge wrote: > > +1 (non-binding) > > On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu > wrote: >> +1 >> Yufei >> >> >> On Mon, Aug 19, 2024 at 1:17 PM Fokko Driesprong >

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Amogh Jahagirdar
+1 (binding) On Mon, Aug 19, 2024 at 5:22 PM Daniel Weeks wrote: > +1 (binding) > > On Mon, Aug 19, 2024, 4:11 PM Steven Wu wrote: > >> +1 (binding) >> >> On Mon, Aug 19, 2024 at 4:06 PM Anton Okolnychyi >> wrote: >> >>> +1 (binding) >>> >>> - Anton >>> >>> пн, 19 серп. 2024 р. о 13:49 John Zh

Re: [DISCUSS] Guidelines for committing PRs

2024-08-19 Thread Anton Okolnychyi
The current state of the PR looks good to me. I feel it is a good starting point that we will update over time. - Anton пт, 16 серп. 2024 р. о 17:55 Walaa Eldin Moustafa пише: > Thanks Micha. It is clearer now. I have left some comments. Let us > continue on the PR. > > On Fri, Aug 16, 2024 at

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Daniel Weeks
+1 (binding) On Mon, Aug 19, 2024, 4:11 PM Steven Wu wrote: > +1 (binding) > > On Mon, Aug 19, 2024 at 4:06 PM Anton Okolnychyi > wrote: > >> +1 (binding) >> >> - Anton >> >> пн, 19 серп. 2024 р. о 13:49 John Zhuge пише: >> >>> +1 (non-binding) >>> >>> On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Steven Wu
+1 (binding) On Mon, Aug 19, 2024 at 4:06 PM Anton Okolnychyi wrote: > +1 (binding) > > - Anton > > пн, 19 серп. 2024 р. о 13:49 John Zhuge пише: > >> +1 (non-binding) >> >> On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu wrote: >> >>> +1 >>> Yufei >>> >>> >>> On Mon, Aug 19, 2024 at 1:17 PM Fokko Dr

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Anton Okolnychyi
+1 (binding) - Anton пн, 19 серп. 2024 р. о 13:49 John Zhuge пише: > +1 (non-binding) > > On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu wrote: > >> +1 >> Yufei >> >> >> On Mon, Aug 19, 2024 at 1:17 PM Fokko Driesprong >> wrote: >> >>> +1 >>> >>> Op ma 19 aug 2024 om 22:01 schreef Russell Spitzer <

Re: [DISCUSS] Adding RemovePartitionSpecsUpdate update type to REST

2024-08-19 Thread Anton Okolnychyi
Seems reasonable to me. - Anton пн, 19 серп. 2024 р. о 15:19 Amogh Jahagirdar <2am...@gmail.com> пише: > Hi all, > > There has been work [1] to enable users to remove historical partition > specs which are not referenced in manifests as a form of metadata cleanup. > As part of this, a new metada

[DISCUSS] Adding RemovePartitionSpecsUpdate update type to REST

2024-08-19 Thread Amogh Jahagirdar
Hi all, There has been work [1] to enable users to remove historical partition specs which are not referenced in manifests as a form of metadata cleanup. As part of this, a new metadata update type RemovePartitionSpecsUpdate needs to be added to enable REST Catalogs to be able to perform this oper

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread John Zhuge
+1 (non-binding) On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu wrote: > +1 > Yufei > > > On Mon, Aug 19, 2024 at 1:17 PM Fokko Driesprong wrote: > >> +1 >> >> Op ma 19 aug 2024 om 22:01 schreef Russell Spitzer < >> russell.spit...@gmail.com>: >> >>> +1 - Feels duplicative to vote here and approve on

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Yufei Gu
+1 Yufei On Mon, Aug 19, 2024 at 1:17 PM Fokko Driesprong wrote: > +1 > > Op ma 19 aug 2024 om 22:01 schreef Russell Spitzer < > russell.spit...@gmail.com>: > >> +1 - Feels duplicative to vote here and approve on the PR >> >> On Mon, Aug 19, 2024 at 2:41 PM Ryan Blue wrote: >> >>> Hi everyone,

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Ryan Blue
+1 I agree it's a bit duplicative, but we want to make sure that spec changes are highlighted on the dev list. On Mon, Aug 19, 2024 at 1:17 PM Fokko Driesprong wrote: > +1 > > Op ma 19 aug 2024 om 22:01 schreef Russell Spitzer < > russell.spit...@gmail.com>: > >> +1 - Feels duplicative to vote

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Fokko Driesprong
+1 Op ma 19 aug 2024 om 22:01 schreef Russell Spitzer < russell.spit...@gmail.com>: > +1 - Feels duplicative to vote here and approve on the PR > > On Mon, Aug 19, 2024 at 2:41 PM Ryan Blue wrote: > >> Hi everyone, >> >> I'd like to vote on PR #10948 >>

Re: Type promotion in v3

2024-08-19 Thread Ryan Blue
I don’t think that type promotion by replacing a column is a good direction to head. Right now we have a fairly narrow problem of not having the original type information for stats. That’s a problem with a fairly simple solution in the long term and it doesn’t require the added complexity of replac

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Russell Spitzer
+1 - Feels duplicative to vote here and approve on the PR On Mon, Aug 19, 2024 at 2:41 PM Ryan Blue wrote: > Hi everyone, > > I'd like to vote on PR #10948 > , which has some spec > changes to prepare for v3: > > * Add a high-level v3 summary (only c

[VOTE] Spec changes in preparation for v3

2024-08-19 Thread Ryan Blue
Hi everyone, I'd like to vote on PR #10948 , which has some spec changes to prepare for v3: * Add a high-level v3 summary (only changes already in the spec) * Clarify the existing v3 requirement for handling specs with unknown transforms * Reset headi

Re: [DISCUSS] Row Lineage Proposal

2024-08-19 Thread Ryan Blue
The situation in which you would use equality deletes is when you do not want to read the existing table data. That seems at odds with a feature like row-level tracking where you want to keep track. To me, it would be a reasonable solution to just say that equality deletes can't be used in tables w

Re: [DISCUSS] Row Lineage Proposal

2024-08-19 Thread Russell Spitzer
As far as I know Flink is actually the only engine we have at the moment that can produce Equally deletes and only Equality deletes have this specific problem. Since an equality delete can be written without actually knowing whether rows are being updated or not, it is always ambiguous as to whethe

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-19 Thread Fokko Driesprong
Hey Piotr, I missed this email, but Carls' reply pushed it to the top of my mailbox. As you can read on the list, the release is a bit more challenging than expected. I got the docker image fixed last Friday , and I'm building the artifacts right now. So

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-19 Thread Carl Steinbach
I'm +1 on doing the 1.6.1 release now, followed by 1.6.2 with the Avro changes once they become available. I'm also available to help with the release, though Piotr has already volunteered to be the release manager, which is great. - Carl On Thu, Aug 15, 2024 at 10:24 AM Piotr Findeisen wrote:

Re: Type promotion in v3

2024-08-19 Thread Ryan Blue
If the reader logic depends on the length of data in bytes, will this prevent us from adding any type promotions to string? This ends up being a little different because we can detect more cases when the bounds must have been strings — any time when the length of the upper and lower bound is diffe

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
I think continuing to define type promotion as something that happens implicitly from the reader perspective has a few issues: 1. It makes it difficult to reason about all additional features that might require stable types to interpret. Examples of existing filters: partition statistics file, e

Re: Type promotion in v3

2024-08-19 Thread Amogh Jahagirdar
Hey all, > There might be an easy/light way to add this new metadata: we can persist schema_id in the DataFile. It still adds some extra size to the manifest file but should be negligible? I do think it's probably negligible in terms of the size (at least in terms of the value that we get out of

Re: Type promotion in v3

2024-08-19 Thread Xianjin YE
Hey Fokko, > Distribute all the schemas to the executors, and we have to do the lookup and > comparison there. I don’t think this would be a problem: the schema id in the DataFile should be only used in driver’s planning phase to determine the lower/upper bounds, so no extra schema except the

Re: Type promotion in v3

2024-08-19 Thread Fokko Driesprong
Thanks Ryan for bringing this up, that's an interesting problem, let me think about this. we can persist schema_id in the DataFile This was also my first thought. The two drawbacks are: - Distribute all the schemas to the executors, and we have to do the lookup and comparison there. -

Re: Type promotion in v3

2024-08-19 Thread Xianjin YE
Thanks Ryan for bringing this up. > int and long to string Could you elaborate a bit on how we can support type promotion for `int` and `long` to `string` if the upper and lower bounds are already encoded in 4/8 bytes binary? It seems that we cannot add promotions to string as Piotr pointed o

Re: Type promotion in v3

2024-08-19 Thread Yujiang Zhong
Hi Ryan, I don't understand how the Parquet format manifests would resolve the type promotion issue here. Could you please provide more detailed information to help me understand it? Thank you. > That would also fix type promotion because the manifest file schema would > include full type info

Re: Table schema and partition spec update

2024-08-19 Thread Xianjin YE
Hey Péter, For evolving the schema, Spark has the ability to mergeSchema based into the new incoming Schema, you may want t

Re: Table schema and partition spec update

2024-08-19 Thread Fokko Driesprong
Hey Peter, Thanks for raising this since I recently ran into the same issue. The APIs that we have today nicely hide the field IDs from the user, which is great. I do think all the methods are in there to evolve the schema to the desired one, however, we don't have a way to control the field-IDs.

Table schema and partition spec update

2024-08-19 Thread Péter Váry
Hi Team, I'm playing around with creating a Flink Dynamic Sink which would allow schema changes without the need for job restart. So when a record with an unknown schema arrives, then it would update the Iceberg table to the new schema and continue processing the records. Lets's say, I have the `

[RESULT][VOTE] Release Apache Iceberg Rust 0.3.0 RC1

2024-08-19 Thread Xuanwo
Hello, Apache Iceberg Rust Community, The vote to release Apache Iceberg Rust 0.3.0-rc.1 has passed. The vote PASSED with 3 +1 binding and 1 +2 non-binding votes, no +0 or -1 votes: Binding votes: - Renjie Liu - Amogh Jahagirdar - Fokko Driesprong Non-Binding votes: - NOTME ZE - Christian Thi

Re: [VOTE] Release Apache Iceberg Rust 0.3.0 RC1

2024-08-19 Thread Fokko Driesprong
+1 (binding) Thanks Xuanwo for running this release, and sorry for the late vote, I was doing additional tests against Tabular and had to flex my tiny Rust muscle a bit. - Validated the signatures and checksums - Checked out the licenses

Re: Type promotion in v3

2024-08-19 Thread Piotr Findeisen
Hi, Lack of type information in lower/upper bounds is definitely an interesting problem. For example the 4 bytes \x31\x32\x33\x34 value can be interpreted as string "1234" or 875770417 integer value (stored little-endian). if the reader logic depends on the length of data in bytes, will this preve