Re: [PROPOSAL] Refactore use of Guava Lists.*

2024-10-25 Thread Alex Dutra
Hi all, But aren't we now building on Java 11+? I think we could go one step ahead and replace most of these Guava factory methods by List.of(), List.copyOf() and the like – as long as the collection is not modified after. It's more concise and saves us a Guava import. Thanks, Alex On Thu, Oct

Re: [PROPOSAL] Refactore use of Guava Lists.*

2024-10-25 Thread Jean-Baptiste Onofré
Hi Eduard Yeah, I mean checkstyle (not spotless). AFAIR, I saw a couple of locations without the diamond syntax. Let me find it out. Maybe we can start with fixing there. Thanks ! Regards JB On Thu, Oct 24, 2024 at 5:07 PM Eduard Tudenhöfner wrote: > > Hey JB, > > I don't think we're ever usin

[VOTE] Iceberg Rust Sync Meeting Time

2024-10-25 Thread Renjie Liu
Hi: Following discussion on this thread , we want to start a vote for Iceberg Rust Sync Meeting Time, and here are the options gathered: 1. From Xuanwo: One week before Iceberg Sync Meeting, From 00:00 to 01:00 GTM+8 2. From Renji

Re: [VOTE] Iceberg Rust Sync Meeting Time

2024-10-25 Thread Xuanwo
Hi, My time is somewhat flexible from 19:00 to 01:00 at UTC+8, so I'm okay with all those options. I'm fine with the time as long as more contributors can participate. On Fri, Oct 25, 2024, at 15:56, Renjie Liu wrote: > Hi: > > Following discussion on this thread >

[Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread Gabor Kaszab
Hey Iceberg Community, I read this article the other day and there is this part that caught my attention (amongst others): "For high-throughput streaming ingestion, ... durably store recently ing

Re: [DISCUSS] REST: OAuth2 Authentication Guide

2024-10-25 Thread Christian Thiel
Thanks everyone for your Feedback in the Catalog Sync and afterwards! I tried to address most of the Feedback and updated the Document. * The updated Document can be found here [1]: https://docs.google.com/document/d/1buW9PCNoHPeP7Br5_vZRTU-_3TExwLx6bs075gi94xc/edit?usp=sharing * It is li

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread Péter Váry
I was playing around Flink ingestion performance testing and I have found that the compression codec is also an important factor. Using zstd has much higher write performance, using gzip has higher compression rate. So I would argue that there are more factors which could be optimized for writing

Re: [PROPOSAL] Refactore use of Guava Lists.*

2024-10-25 Thread rdb...@gmail.com
It’s correct that these methods aren’t strictly needed. We could translate every case into a slightly different form: Lists.newArrayList() -> new ArrayList<>() Lists.newArrayList(iter) -> new ArrayList(); Iterators.addAll(list, iter) Lists.newArrayList(iterable) -> new ArrayList<>(); Iterators.add

Re: [VOTE] Iceberg Rust Sync Meeting Time

2024-10-25 Thread Christian Thiel
Hi Renji, thanks for picking it up! I don't have a preference either. Both times work for me. Von: Xuanwo Gesendet: Friday, October 25, 2024 2:13:49 AM An: dev@iceberg.apache.org Betreff: Re: [VOTE] Iceberg Rust Sync Meeting Time Hi, My time is somewhat flexibl

Re: [VOTE] Iceberg Rust Sync Meeting Time

2024-10-25 Thread NOTME ZE
Hi, I prefer From 23:00 to 00:00 GTM+8, last Thursday of each month. But both times work for me. Christian Thiel 于2024年10月25日周五 19:28写道: > Hi Renji, thanks for picking it up! I don't have a preference either. Both > times work for me. > -- > *Von:* Xuanwo > *Gesendet

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread Szehon Ho
+1 on this, should we add an IR field / type to View spec then? Sorry if its discussed already, catching up a bit on the context. Thanks Szehon On Fri, Oct 25, 2024 at 11:51 AM rdb...@gmail.com wrote: > Substrait i

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread Steven Wu
agree with Ryan. Engines usually provide override capability that allows users to choose a different write format (than table default) if needed. There are many production use cases that write columnar formats (like Parquet) in streaming ingestion. I don't necessarily agree that it will be common

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread Szehon Ho
Im dont have hands on experience on Substrait, but wondering, is substrait representation possible today with existing Iceberg view spec? Ie, engines can store today the text serialized substrait representation with sql dialect 'substrait'? Or is it an abuse of spec and we should make a proper f

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread rdb...@gmail.com
Substrait is one of the reasons why we designed views with the ability to have different representations. I think that SQL translation is not a great solution. I'd like to see more focus on a portable intermediate representation like Substrait. That would solve a lot of the limitations with the SQL

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread rdb...@gmail.com
Gabor, The reason why the write format is a "default" is that I intended for it to be something that engines could override. For cases where it doesn't make sense to use the default because of memory pressure (as you might see in ingestion processes) you could choose to override and use a format t

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-25 Thread Kevin Liu
Thanks, Ryan! That makes sense. I want to follow up on the original issue. I've made a PR [1] to enforce that the Snapshot `summary` map must have an `operation` key. Please take a look. Thank you @nastra for the comments and reviews. Best, Kevin Liu [1] https://github.com/apache/iceberg/pull/11

Re: [Discuss] Iceberg View Interoperability

2024-10-25 Thread Walaa Eldin Moustafa
I think this may need some more discussion. To me, a "serialized IR" is another form of a "dialect". In this case, this dialect will be mostly specific to Iceberg, and compute engines will still support reading views in their native SQL. There are some data points on this from the Trino community

Re: [DISCUSS] Discrepancy Between Iceberg Spec and Java Implementation for Snapshot summary's 'operation' key

2024-10-25 Thread Kevin Liu
Thanks, everyone! The PR[1] has been merged Best, Kevin Liu [1] https://github.com/apache/iceberg/pull/11354 On Fri, Oct 25, 2024 at 1:02 PM Kevin Liu wrote: > Thanks, Ryan! That makes sense. > > I want to follow up on the original issue. I've made a PR [1] to enforce > that the Snapshot `sum