Re: [VOTE] Simplify multi-arg table metadata

2025-02-09 Thread xianjin
+1 (non-binding) On Mon, Feb 10, 2025 at 2:03 AM Hussein Awala wrote: > +1 (non-binding) > > On Sun, Feb 9, 2025 at 6:15 PM Matt Topol wrote: > >> +1 (non-binding) >> >> Will definitely make it easier for iceberg-go to support v3 :) >> >> On Sun, Feb 9, 2025, 12:14 PM Szehon Ho wrote: >> >>> +

Re: [DISCUSS] Simplify multi-arg table metadata

2025-02-07 Thread Xianjin Ye
+1. I think it's good timing to allow multi-arg transform for V3 and onwards only. On 2025/02/03 18:26:00 "Driesprong, Fokko" wrote: > Hi everyone, > > While I was looking to add the V3 partition-spec (de/en)coder to PyIceberg, > I noticed that it allows for backporting the multi-arg transforms

Re: Welcome Huaxin Gao as a committer!

2025-02-06 Thread xianjin
Congrats huaxin!Sent from my iPhoneOn Feb 6, 2025, at 7:35 PM, Fokko Driesprong wrote:Congratulations Huaxin!Op do 6 feb 2025 om 12:21 schreef Russell Spitzer :Congratulations!On Thu, Feb 6, 2025 at 11:35 AM Péter Váry wrote:Congratulations

Re: [VOTE] Update partition stats spec for V3

2025-02-04 Thread xianjin
+1, the spec change makes sense. > Make delete counts required to avoid ambiguity w.r.t NULL vs unknown. If we want to make this change, I think we need to unlink all the partitions stats files in old snapshots (if it's already calculated with optional delete counts) when upgrading to V3 table fr

Re: [VOTE] Deletion Vectors in V3

2024-10-30 Thread xianjin
+1 (non binding) On Wed, Oct 30, 2024 at 2:28 PM Jean-Baptiste Onofré wrote: > +1 (non binding) > > Regards > JB > > On Tue, Oct 29, 2024 at 10:45 PM Anton Okolnychyi > wrote: > > > > Hi folks, > > > > We have been discussing the new layout for position deletes in V3 for a > while now. It seems

Re: [DISCUSS] iceberg rust 0.4.0 and iceberg pyiceberg_core 0.1.0 release

2024-09-05 Thread xianjin
+1 for this pyiceberg_core as well. Two cents about the iceberg-rust release schedule: it seems too aggressive to release by 2 weeks, monthly(4 weeks) release would be a nice fit.Sent from my iPhoneOn Sep 5, 2024, at 8:25 PM, Sung Yun wrote:Thank you for driving this Xuanwo!+1 as well, as noted

Re: [VOTE] Merge REST Spec change to add RemovePartitionSpecsUpdate update type

2024-08-26 Thread xianjin
+1 (non-binding)Sent from my iPhoneOn Aug 27, 2024, at 4:22 AM, Fokko Driesprong wrote:+1Op ma 26 aug 2024 om 22:00 schreef Yufei Gu :+1YufeiOn Mon, Aug 26, 2024 at 11:06 AM Ryan Blue wrote:+1On Mon, Aug 26, 2024 at 11:04 AM Amogh Jahagirdar <2am...@gmail.com> wrote:I've op

Re: Table schema and partition spec update

2024-08-20 Thread Xianjin YE
s some extra effort though. It would be great that we can support that in the Flink Dynamic Sink. > On Aug 20, 2024, at 14:26, Péter Váry wrote: > > Hi Fokko, Xianjin, > > Thanks for both proposals, I will take a deeper look soon! Both seems > promising at the first glance

Re: Type promotion in v3

2024-08-20 Thread Xianjin YE
t col1; >string col2; > etc > } > > This is similar to how partition values are stored today in Avro. And I > don't think there is anything stopping from doing this in Avro either, except > it is potentially less useful because you can't save much by selecting

Re: Type promotion in v3

2024-08-19 Thread xianjin
new one? >> >> Yes, an important part of type promotion is validation that whatever >> evolution is being attempted can actually happen if the column being >> evolved is part of a partition transform! I was working on an >> implementation for this and so far it's

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread xianjin
+1 (non-binding)Sent from my iPhoneOn Aug 20, 2024, at 7:56 AM, Manu Zhang wrote:+1 (non-binding)Micah Kornfield 于2024年8月20日 周二07:44写道:+1 (non-binding)On Mon, Aug 19, 2024 at 4:33 PM Steve Zhang wrote:+1 (non-binding) Thanks,Steve Zhang On Aug 19, 2024, at 1:47 PM, John

Re: [DISCUSS] Adding RemovePartitionSpecsUpdate update type to REST

2024-08-19 Thread xianjin
+1 from my side as well.Sent from my iPhoneOn Aug 20, 2024, at 9:09 AM, Yufei Gu wrote:+1, the new spec looks good to me. It seems like the client-side handling the heavy lifting of figuring out which spec to remove is a reasonable approach. YufeiOn Mon, Aug 19, 2024 at 4:01 PM Anton Okolnychyi <

Re: Type promotion in v3

2024-08-19 Thread Xianjin YE
ed on > the partition-spec-id. Evolving the partition spec would fix it. When we > decide to include the schema-id, we would be able to create the evaluator > based on the (partition-spec-id, schema-id) tuple when evaluating the > partitions. > > Kind regards, > Fokko > &g

Re: Type promotion in v3

2024-08-19 Thread Xianjin YE
Thanks Ryan for bringing this up. > int and long to string Could you elaborate a bit on how we can support type promotion for `int` and `long` to `string` if the upper and lower bounds are already encoded in 4/8 bytes binary? It seems that we cannot add promotions to string as Piotr pointed o

Re: Table schema and partition spec update

2024-08-19 Thread Xianjin YE
Hey Péter, For evolving the schema, Spark has the ability to mergeSchema based into the new incoming Schema, you may want t

Re: [VOTE] Merge REST spec clarification on how servers should handle unknown updates/requirements

2024-08-13 Thread xianjin
+1On Aug 14, 2024, at 2:24 AM, Ryan Blue wrote:+1On Tue, Aug 13, 2024 at 8:59 AM Yufei Gu wrote:+1YufeiOn Tue, Aug 13, 2024 at 8:57 AM Eduard Tudenhöfner wrote:+1On Tue, Aug 13, 2024 at 5:09 PM Amogh Jahagirdar <2am...@gmail.com> wrote:I've opened

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-06 Thread Xianjin YE
Thanks for raising the discussion Huaxin. I also think partition-level statistics file(s) are more useful and has advantage over table-level stats. For instance: 1. It would be straight forward to support incremental stats computing for large tables: by recalculating new or updated partitions on

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Xianjin YE
> DataFile rewrite will create a new manifest file. This means if a DataFile > rewrite task is finished and committed, and there is a concurrent > ManifestFile rewrite then the ManifestFile rewrite will fail. I have played > around with serializing the Maintenance Tasks (resulted in a very ugly/

Re: [DISCUSS] DROP PARTITION in Spark

2024-08-06 Thread Xianjin YE
at should happen, not > how to do it. > > On Fri, Aug 2, 2024 at 10:20 AM Xianjin YE <mailto:xian...@apache.org>> wrote: >> > we would instead add support for pushing down `CAST` expressions from Spark >> >> Supporting pushing down more expressions is de

Re: [DISCUSS] Clarify in REST spec expected implementation behavior for unknown updates or requirements

2024-08-06 Thread Xianjin YE
Thanks Amogh for driving this discussion. I’m also +1 for 400 status code as others pointed out that the server is unable to determine the request is well formed or not. > On Aug 6, 2024, at 05:28, Amogh Jahagirdar <2am...@gmail.com> wrote: > > I also went back and forth on 400 vs 422 but ult

Re: [DISCUSS] DROP PARTITION in Spark

2024-08-02 Thread Xianjin YE
os-delete files or rewrite the whole data files. > On Aug 2, 2024, at 23:27, Ryan Blue wrote: > > There's a potential solution that's similar to what Xianjin suggested. Rather > than adding a new SQL keyword (which is a lot of work and specific to > Iceberg) we would in

Re: [DISCUSS] DROP PARTITION in Spark

2024-08-02 Thread Xianjin YE
> b) they have a concern that with getting the WHERE filter of the DELETE not > aligned with partition boundaries they might end up having pos-deletes that > could have an impact on their read perf I think this is a legit concern and currently `DELETE FROM` cannot guarantee that. It would be va

Re: [ANNOUNCE] Welcoming new committers and PMC members

2024-07-24 Thread Ye Xianjin
Congrats all, well done !Sent from my iPhoneOn Jul 24, 2024, at 11:33 PM, Péter Váry wrote:Congratulations all!Bryan Keller ezt írta (időpont: 2024. júl. 24., Sze, 16:21):Congrats all!On Jul 24, 2024, at 3:14 AM, Eduard Tudenhöfner wrote:Congrats eve

Re: Support Flink SQL Upsert a Spark table

2024-01-10 Thread xianjin
You can create an Iceberg table with required field, for example: create table test_table (id bigint not null, data string) using iceberg However you can not change the optional field to required after creation. See this issue for more details: https://github.com/apache/iceberg/issues/3617 Manu