Re: [DISCUSS] Clarify in REST spec expected implementation behavior for unknown updates or requirements

2024-08-07 Thread Yufei Gu
Thanks Amogh for starting this discussion. I agree that using 400 makes sense, especially since the server might not fully recognize the request. It’s a straightforward way to handle these situations and avoid potential misunderstanding. Yufei On Tue, Aug 6, 2024 at 5:53 AM Xianjin YE wrote: >

Re: [DISCUSS] Iceberg-rust based Ruby bindings

2024-08-07 Thread Chris Atkins
> Do you know how big the Ruby data community is? I think the most important part is that it gets some traction and will continue to be maintained. Its a great question Fokko! I'd say that the data community in Ruby is nascent, but definitely exists. There are some prolific folks like Andrew Kane

Re: [DISCUSS] Changing namespace separator in REST spec

2024-08-07 Thread Dmitri Bourlatchkov
The idea of client-chosen separator char (?delim=.) sounds pretty reasonable to me. Nonetheless, I do not think this covers all the issues in putting namespaces in URI paths for servers running under the new Servlet spec. In particular, there are other chars that are considered "suspicious" by the

Re: [DISCUSS] Changing namespace separator in REST spec

2024-08-07 Thread Jack Ye
Sorry a bit late to this thread. I would personally prefer the client side separator solution (query param with `?delim=.`) a bit more than the server side (config override), just given the experience of handling similar situations for Glue data catalog which allows any name for database (namespac

Re: [DISCUSS] adoption of format version 3

2024-08-07 Thread Ryan Blue
we can still discuss the remaining items in the Iceberg geometry proposal: expression, partition transform For these two, I think that we can make them backward-compatible. If we update the spec to allow adding transforms between major releases, then we don’t need to have the transform done by v3.

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-07 Thread Walaa Eldin Moustafa
Piotr, what do you mean by making user-created functions shareable between engines? Do you mean UDFs written in imperative code? On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen wrote: > > Hi, > > Thank you Ajantha for creating this thread. The Iceberg UDFs are an > interesting idea! > Is there a

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-07 Thread Piotr Findeisen
Hi, Thank you Ajantha for creating this thread. The Iceberg UDFs are an interesting idea! Is there a plan to make the user-created functions sharable between the engines? If so, how would a CREATE FUNCTION statement look like in e..g Spark or Trino? Meanwhile, added a few comments in the doc. Be

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-07 Thread Piotr Findeisen
Hey Fokko, thanks, that makes sense! Do you maybe know the timeline for the Avro release? Trino awaits the 1.6.1 release, so it would be great if we could get this rolling rather sooner than later. Best Piotr On Wed, 7 Aug 2024 at 16:33, Driesprong, Fokko wrote: > Hey Piotr, > > The Avro rel

Re: Flink Table Maintenance - Tag based locking

2024-08-07 Thread Ryan Blue
If this is specific to solving the problem that there is no notification when a task finishes in Flink, then I think it makes sense to use a JDBC lock. I'd prefer that this not add the tag-based locking strategy because I think that has the potential to be misunderstood by people using the library

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-07 Thread Steven Wu
I also like the middle ground of partition level stats, which is also easier to perform incremental refresh (at partition level). if the roll-up of partition level stats turned out to be slow, I don't mind adding table level stats aggregated from partition level stats. Having partition level stats

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-07 Thread Driesprong, Fokko
Hey Piotr, The Avro release still has to be done. We have 1.12.0 which has been released, but that also drops Java 8 support, so we can't backport it. We still have to run the 1.11.4 Avro release to backport the CVE fix. Kind regards, Fokko Op wo 7

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-07 Thread Manish Malhotra
First of all thanks a lot Huaxin for starting an important proposal and thread! A lot of important points are already discussed. For me, my thoughts were also tilting towards the partition level stats, what Piotr, Alex, Anton and a few others have mentioned as well. IMO, partition level stats mi

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-07 Thread Piotr Findeisen
Hi Thank you JB and Eduard for commenting! JB, which Avro version we would be updating to for the CVE fix? Best Piotr On Mon, 29 Jul 2024 at 13:36, Jean-Baptiste Onofré wrote: > That's fair (and I agree), but as these coming Avro releases include > CVE fix, I think it's worth considering. >

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-07 Thread Piotr Findeisen
Hi All, Thank you for interesting discussion so far, and many view points shared! > Not all tables have partition definition and table-level stats would benefit these tables Agreed that tables not always have partitions. Current partition stats are appropriate for partitioned tables only mainly

Re: Flink Table Maintenance - Tag based locking

2024-08-07 Thread Péter Váry
Hi Anton, nice to hear from you! Thanks Ryan for your continued interest! You can find my answers below: > Am I right to say the proposal has the following high-level goals: > - Perform cheap maintenance actions periodically after commits using the same cluster (suitable for things like rewriting