Re: Nested column types and equality delete files

2023-10-27 Thread Renjie Liu
You are right. Null always needs special treatment. I think allowing null value in equality id is reasonable, but should we treat it as distinct? PG treats it as destinct by default, but allows configuration to treat it as no distinct: https://stackoverflow.com/questions/8289100/create-unique-co

Re: [VOTE] Release Apache PyIceberg 0.5.1 RC2

2023-10-27 Thread Daniel Weeks
+1 (binding) Verified sigs/sums/license/install/test (python 3.10) Ran extensive filter tests and everything worked as expected with Arrow/Pandas/DuckDB. -Dan On Fri, Oct 27, 2023 at 3:02 PM Hussein Awala wrote: > +1 (non-binding) I ran the example notebooks and tested some queries > with PyA

Re: [VOTE] Release Apache PyIceberg 0.5.1 RC2

2023-10-27 Thread Hussein Awala
+1 (non-binding) I ran the example notebooks and tested some queries with PyArrow and Pandas, all looks good. On Fri, Oct 27, 2023 at 11:46 AM Jean-Baptiste Onofré wrote: > +1 (non binding) > > I checked: > - hash and signatures are good > - I will check NOTICE (copyright is 2022 and I think som

Re: Nested column types and equality delete files

2023-10-27 Thread Micah Kornfield
> > Iceberg spec has a clear definition of constraints about identifier id > fields . I think > it would make sense if equality id fields share similar constraints. Makes sense, however it appears that for equality delete null values are int

[DISCUSS] Equality field IDs for data files?

2023-10-27 Thread Anton Okolnychyi
Hey folks, I see that our Java implementation does not follow the spec when it comes to storing equality field IDs. The spec says that column is for equality deletes only. However, our builder for DataFile allows setting and persisting that field in the metadata. To make things worse, the actio

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
Correct, we run check-license for each PR thanks to license_check.yml GH workflow. However, the contributor has to run the dev/check-license manually (it's not part of gradle build). I agree that the PR level is good enough. So, I propose to move forward on a new rat version without dot-directory

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Ryan Blue
We already run RAT checks on every PR, so I'm not sure there's a lot of value in moving the checks to gradle. That just means that we would need to use a different framework across the implementations. If there's a way to run license checks in CI that doesn't have the dot-file limitation, that seem

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
Thanks for the details! To be honest, I still prefer the "light build" approach with gradle, because it's pretty easy for contributors to check license headers in their contributed file (as with gradle plugin, it will be included in the check phase). I think it's good to have it in the regular "lo

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Xuanwo
Here are some quick notes for skywalking-eyes, hoping them will be helpful. Before using skywalking-eyes, we need to setup config as said in [1]. Take iceberg-rust as an example [2]. For checking in CI: Adding following content in workflow [3] - name: Check License Header uses: apache/skywal

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
Thanks for the heads up Xuanwo. It's the fourth option :) I will make a comparison with RAT. Regards JB On Fri, Oct 27, 2023 at 12:15 PM Xuanwo wrote: > > iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now. > > BTW, we found skywalking-eyes works really well. It's fast, correct and

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Xuanwo
iceberg-rust is using apache/skywalking-eyes/header@v0.5.0 now. BTW, we found skywalking-eyes works really well. It's fast, correct and well-maintained. Maybe worth take a look. On Fri, Oct 27, 2023, at 17:48, Jean-Baptiste Onofré wrote: > By the way, as dev/check-license is also used in iceber

Re: [PROPOSAL] Improve dev/check-license

2023-10-27 Thread Jean-Baptiste Onofré
By the way, as dev/check-license is also used in iceberg-python and iceberg-go repositories (iceberg-rust doesn't have it), maybe I can move forward on new rat release with the fix on hidden directories and update there as well. Regards JB On Thu, Oct 26, 2023 at 5:19 PM Jean-Baptiste Onofré wro

Re: Feedback on Iceberg Materialized View Spec

2023-10-27 Thread Jan Kaul
Thank you Dan and the others for your helpful comments. I've added some sections to address the points that you mentioned. I'm not really sure what you mean by fail after grace period. I've found a design document for the trino materialized views and tried to incorporate some of the points. I'

Re: [VOTE] Release Apache PyIceberg 0.5.1 RC2

2023-10-27 Thread Jean-Baptiste Onofré
+1 (non binding) I checked: - hash and signatures are good - I will check NOTICE (copyright is 2022 and I think some deps are missing there), not release blocker - ASF headers are present - no binary file detected - very quick test Regards JB On Tue, Oct 24, 2023 at 8:48 PM Fokko Driesprong wro

Re: Community Meeting Minutes ?

2023-10-27 Thread Jean-Baptiste Onofré
Thanks Brian, much appreciated! Regards JB On Thu, Oct 26, 2023 at 10:29 PM Brian Olsen wrote: > > Thanks for the reminder here JB. I just created a list to follow for this > process so I don't forget. At some point, I'll add it to the documentation so > that anyone can run this over time. I w

Re: Meeting Minutes from 2023-10-11 Iceberg Sync

2023-10-27 Thread Brian Olsen
The spacing was after sending the email. If you click on the YouTube Link, it splits the YouTube video into chapters and spaces them out. They are more legible there. I’ll make sure to add bulletpoints moving forward. On Thu, Oct 26, 2023 at 11:00 PM Xuanwo wrote: > Thanks for the meeting reco

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-27 Thread Renjie Liu
Hi, Anton: I've gone through the doc, and the Puffin Position Delete Files section shares some similarity with the deletion vector approach. Is there any conclusion about the discussion? On Thu, Oct 12, 2023 at 12:11 AM Anton Okolnychyi wrote: > I tried to summarize notes from our previous discu