Re: [DISCUSS] Multi-arg transforms

2025-03-25 Thread Russell Spitzer
1. This makes sense to me, it was the only one requested in the past. It
should allow "IN" as well.
2.What's the suggestion here? To allow both source-id and source-ids in V3
but error out if the two don't match? Trying to determine how the
validation would look in both cases

On Tue, Mar 25, 2025 at 2:04 PM Fokko Driesprong  wrote:

> Hi everyone,
>
> I wanted to get your attention to some small changes
>  to the multi-arg
> transforms that I've bumped into while working on the V3 spec for PyIceberg.
>
>1. Up for debate. The spec does not point out an actual implementation
>of transforms that accept multiple arguments. From the existing transforms,
>the only contender is the bucket transform. Should we include this in the
>V3 spec? It will only allow you to prune metadata if you do an equality
>expression on all the fields that are part of the transform.
>2. Along the way, we've removed something that we did not intend.
>First we allowed to write source-id and source-ids based on the number of
>arguments. This has been changed to only allow source-ids for V3 in a PR
>that introduces backward compatibility. I think this makes the JSON
>parsers/producers more complex than needed (specifically PyIceberg). Also,
>in Java, we would need to plumb down the table version to the
>PartitionSpecParser.java. I think it would be great to simplify this.
>
> Please let me know what you think so we can tie up the loose ends for V3.
>
> Kind regards,
> Fokko
>
>
>
>


[DISCUSS] Multi-arg transforms

2025-03-25 Thread Fokko Driesprong
Hi everyone,

I wanted to get your attention to some small changes
 to the multi-arg transforms
that I've bumped into while working on the V3 spec for PyIceberg.

   1. Up for debate. The spec does not point out an actual implementation
   of transforms that accept multiple arguments. From the existing transforms,
   the only contender is the bucket transform. Should we include this in the
   V3 spec? It will only allow you to prune metadata if you do an equality
   expression on all the fields that are part of the transform.
   2. Along the way, we've removed something that we did not intend. First
   we allowed to write source-id and source-ids based on the number of
   arguments. This has been changed to only allow source-ids for V3 in a PR
   that introduces backward compatibility. I think this makes the JSON
   parsers/producers more complex than needed (specifically PyIceberg). Also,
   in Java, we would need to plumb down the table version to the
   PartitionSpecParser.java. I think it would be great to simplify this.

Please let me know what you think so we can tie up the loose ends for V3.

Kind regards,
Fokko


Re: [VOTE][Go] Release Apache Iceberg Go v0.2.0 RC1

2025-03-25 Thread Fokko Driesprong
+1 (binding)

Thanks for running this release Matt, and for adding the additional tests!

Checked the V1/V2 metadata/manifests/manifest-list, all looks good. Ran
tests against the REST catalog and everything seems to work great! Checked
the signatures, checksums and licenses.

Kind regards,
Fokko

Op vr 21 mrt 2025 om 21:48 schreef Kevin Liu :

> +1 (non-binding)
>
> Ran verification script `dev/release/verify_rc.sh 0.2.0 1`.
> Built and tested with the CLI against pyiceberg's integration tests
> catalog.
>
> Best,
> Kevin Liu
>
>
> On Thu, Mar 20, 2025 at 10:57 AM Matt Topol 
> wrote:
>
>> Hi,
>>
>> I would like to propose the following release candidate (RC1) of
>> Apache Iceberg Go version v0.2.0.
>>
>> This release candidate is based on commit:
>> cfd2c3ba2b61106bbbfdd1c0d045cc467c42c4e0 [1]
>>
>> The source release rc1 is hosted at [2].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [3] for how to validate a release candidate.
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Iceberg Go v0.2.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Iceberg Go v0.2.0 because...
>>
>> [1]:
>> https://github.com/apache/iceberg-go/tree/cfd2c3ba2b61106bbbfdd1c0d045cc467c42c4e0
>> [2]:
>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-0.2.0-rc1
>> [3]:
>> https://github.com/apache/iceberg-go/blob/main/dev/release/README.md#verify
>>
>


Re: [VOTE] Minor simplifications for Geo Spec

2025-03-25 Thread Manish Malhotra
Late to the party :)

Thanks Szehon

+1 (non-binding)

On Sun, Mar 23, 2025 at 5:53 PM Szehon Ho  wrote:

> Thanks all for voting!
>
> The vote result is:
>
> +1: 9 (binding: Renjie, Eduard, Fokko, Yufei, Ryan, Daniel, Amogh,
> Russell, Szehon), 9 (non-binding: Jia, Gang, Bryan, Huang-Hsiang, Matt,
> Jean-Baptiste, Steve, Prasant, Huaxin)
> +0: 0 (binding), 0 (non-binding)
> -1: 0 (binding), 0 (non-binding)
>
> Therefore, the vote passes.
> Szehon
>
> On Sun, Mar 23, 2025 at 5:47 PM Szehon Ho  wrote:
>
>> +1
>>
>> On Sat, Mar 22, 2025 at 10:42 PM huaxin gao 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Sat, Mar 22, 2025 at 6:32 PM Prashant Singh 
>>> wrote:
>>>
 +1 (non binding)

 Best,
 Prashant

 On Fri, Mar 21, 2025 at 10:03 AM Russell Spitzer <
 russell.spit...@gmail.com> wrote:

> +1 (bind
>
> On Fri, Mar 21, 2025 at 11:53 AM Steve Zhang
>  wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>> Steve Zhang
>>
>>
>>
>> On Mar 18, 2025, at 6:29 PM, Gang Wu  wrote:
>>
>> +1 (non-binding)
>>
>>
>>


Re: [DISCUSS] Row lineage required for v3

2025-03-25 Thread Ryan Blue
Okay, it sounds like we have consensus that it's a good idea to make row
lineage required in v3 and that it's a good idea to signal to engines when
they can write delete-and-insert changes. I think we need a bit more
discussion on how to signal to engines, but in the meantime we can move
forward with the row lineage changes. Thanks for the discussion, and I'll
come up with a proposal for the property that Peter is suggesting.

Ryan

On Mon, Mar 24, 2025 at 7:02 AM Péter Váry 
wrote:

> > Would this property cause streaming writes using equality deletes to
> fail until the table is updated? I’m open to this solution since I think
> people should definitely be aware of the trade-offs they’re making in their
> tables.
>
> I don't think we can do such a check on the Iceberg side. As discussed
> there could be perfectly valid reasons to write equality deletes and keep
> row lineage. Even positional deletes written by an outdated engine could
> mess up the lineage information.
>
> I would push the responsibility to the engine to check the property when
> writing updates to a V3 table. If the property is set then queries which
> would write delete-and-insert type updates should be rejected.
>
> Thanks,
> Peter
>
> Amogh Jahagirdar <2am...@gmail.com> ezt írta (időpont: 2025. márc. 21.,
> P, 19:08):
>
>> I support enabling row lineage by default primarily because of the
>> ecosystem benefit that enables engines to rely on lineage without requiring
>> users to opt in explicitly. This should generally apply to most engines and
>> integrations.
>>
>>
>> However, as we know there are specific cases in the ecosystem—such as
>> streaming engines producing equality deletes where we know row lineage
>> cannot be preserved due to the expensive read and state management that
>> would be involved.
>>
>>
>> So that said, I think I agree with Peter that having a table property to
>> indicate whether row lineage is accurate or not would be beneficial. I
>> think this is preferable to expecting users to understand the nuances of
>> different engines regarding lineage preservation. It provides users with a
>> clear indication of what is happening in their table.
>>
>>
>> Thanks,
>>
>>
>> Amogh Jahagirdar
>>
>