Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Fokko Driesprong
Nice! I fully agree with the abovementioned. I originally set up the stalebot for the issues because I noticed that there were many issues around old Spark versions that weren't even maintained anymore. I feel it is better to either close or take action on an issue. For me, it makes sense to extend

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Jean-Baptiste Onofré
Hi That's also the purpose of the reviewers file: having multiple reviewers per tag. Thanks guys for your feedback, I will move forward with the PR :) Regards JB On Thu, Jan 4, 2024 at 6:38 AM Ajantha Bhat wrote: > > +1, > > Some of my PRs have been open for a long time and sometimes it doesn'

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Ajantha Bhat
+1, Some of my PRs have been open for a long time and sometimes it doesn't get the attention it requires. Notifying both the reviewer and the author can help expedite the review process and facilitate quicker handling of new contributions. I think having more than one committer assigned for PR can

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Amogh Jahagirdar
+1, I think this is a step in the right direction. One other consideration I wanted to bring up was dependabot and if there's any unique handling we want to do there because I've noticed that PRs from dependabot tend to pile up. I think with the proposal we won't really need to do anything unique a

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Renjie Liu
This proposal sounds good to me. If we talk specifically about governance features, I am not sure if column > property is the best way though. Consider the case of having a column which > was not PII, but becomes PII because certain law has passed. The operation > a user would perform in this case

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread OpenInx
Hi Zotan Thanks for the issue, I think it's fair to wait for a new major release for this breaking change. Best Regards. On Wed, Jan 3, 2024 at 11:16 PM Zoltán Borók-Nagy wrote: > Hi, > > I created a IMPALA-12675 > about annotating > STRING

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Brian Olsen
+1 My team did an initial manual review of the Trino backlog and we found a lot of value there. 1) We found 3 PRs that were ready for merge but accidentally missed the boat for deployment. 2) We revived a few older PRs where there was actual interest from the developers. 3) With the PR count down

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Manu Zhang
+1. In the same spirit, our ISSUE flows can also be improved. There are over 900 open issues without proper tags, fix versions, etc, and many of them are no longer valid. This can be a separate proposal though. On Thu, Jan 4, 2024 at 9:18 AM John Zhuge wrote: > +1 good idea > > On Wed, Jan 3, 20

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread John Zhuge
+1 good idea On Wed, Jan 3, 2024 at 5:15 PM Renjie Liu wrote: > +1 for this enhancement. > > On Thu, Jan 4, 2024 at 2:19 AM Jack Ye wrote: > >> +1, sounds like a good idea to clean up stale PRs. >> >> -Jack >> >> On Wed, Jan 3, 2024 at 9:52 AM Russell Spitzer >> wrote: >> >>> I definitely need

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Renjie Liu
+1 for this enhancement. On Thu, Jan 4, 2024 at 2:19 AM Jack Ye wrote: > +1, sounds like a good idea to clean up stale PRs. > > -Jack > > On Wed, Jan 3, 2024 at 9:52 AM Russell Spitzer > wrote: > >> I definitely need something to keep emailing me, so I support this. >> >> On Wed, Jan 3, 2024 at

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread John Zhuge
Hi Walaa, Netflix internal Spark and Iceberg have supported column metadata in Iceberg tables since Spark 2.4. The Spark data type is `org.apache.spark.sql.types.Metadata` in StructType. The feature is used by ML teams. It'd be great for the feature to be adopted. On Wed, Jan 3, 2024 at 1:18 PM

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Walaa Eldin Moustafa
Thanks Jack! I think generic key value pairs are still valuable, even for data governance. Regarding schema versions and PII evolution over time, I actually think it is a good feature to keep PII and schema in sync across versions for data reproducibility. Consistency is key in time travel scenar

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Jack Ye
+1, sounds like a good idea to clean up stale PRs. -Jack On Wed, Jan 3, 2024 at 9:52 AM Russell Spitzer wrote: > I definitely need something to keep emailing me, so I support this. > > On Wed, Jan 3, 2024 at 7:52 AM Jean-Baptiste Onofré > wrote: > >> Hi guys, >> >> We have several examples whe

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Jack Ye
Thanks for bringing this topic up! I can provide some perspective about AWS Glue's related features. AWS Glue table definition also has a column parameters feature (ref ). This does not

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Russell Spitzer
I definitely need something to keep emailing me, so I support this. On Wed, Jan 3, 2024 at 7:52 AM Jean-Baptiste Onofré wrote: > Hi guys, > > We have several examples where we have some kind of "stale" PRs, > either because we are waiting for a review, or we are waiting for > changes from the c

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread Zoltán Borók-Nagy
Hi, I created a IMPALA-12675 about annotating STRINGs with UTF8 by default. The code change should be trivial, but I'm afraid we will need to wait for a new major release with this (because users might store binary data in STRING columns, so it

[PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Jean-Baptiste Onofré
Hi guys, We have several examples where we have some kind of "stale" PRs, either because we are waiting for a review, or we are waiting for changes from the contributor. We are already using two jobs around issues/PRs: - labeler to label PRs depending of the Iceberg modules change scope - stale

Re: Community Meeting Minutes ?

2024-01-03 Thread Jean-Baptiste Onofré
Hi Brian Thanks ! About when the meeting minutes should be posted, it's best effort, no need to rush. However, let say that one week max after the meeting would be great. Regards JB On Wed, Jan 3, 2024 at 12:21 PM Brian Olsen wrote: > > Hey all, > > I am just about to push them this morning al

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread OpenInx
Thanks Zoltan and Ryan for your feedback. I think we all agreed that adding an option to promote BINARY to String (Approach A) in flink/spark/hive reader sides to read those historic dataset correctly written by impala on hive already. Besides that, applying approach B to future Apache Impala rel

01-03-2024 Community Sync is Cancelled for the holidays

2024-01-03 Thread Brian Olsen
Hello everyone, The community sync that would generally happen today has been cancelled as many are still on. holiday this week. I hope you all had a fun holiday and new years! There's a lot of great things on the docket for this year and I can't wait to meet them all with you! Take care and safe

Meeting Minutes from 2023-12-13 Iceberg Sync

2024-01-03 Thread Brian Olsen
Hey all, Here are the meeting minutes from the meeting just before the holiday. https://www.youtube.com/watch?v=OwyBlUi2CRc - Highlights - Encryption: Added StandardEncryptionManager(https://github.com/apache/iceberg/pull/6884) (Thanks, Gidon!) - Views: Added support in REST catalog (Tha

Re: Community Meeting Minutes ?

2024-01-03 Thread Ajantha Bhat
Thanks for the update. If compiling notes is the bottleneck, we can skip this step. I think the highlights section from the sync notes should be enough to give a summary. We can paste that in the mail. If someone needs more details, they can watch the video. Thanks, Ajantha On Wed, Jan 3, 2024 a

Re: Community Meeting Minutes ?

2024-01-03 Thread Brian Olsen
Hey all, I am just about to push them this morning along with the announcement that today’s meeting is cancelled. I apologize for yet again having this be delayed. I started working on a script to automate the compilation of the AI summary + meeting minutes to a youtube friendly format for the li

Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Walaa Eldin Moustafa
Hi Iceberg Developers, I would like to start a discussion on a potential enhancement to Iceberg around the implementation of key-value style properties (tags) for individual columns or fields. I believe this feature could have significant applications, especially in the domain of data governance.