Re: [DISCUSS] Iceberg roadmap

2022-02-17 Thread OpenInx
t;>> I'm not entirely sure what that collaboration would look like just yet >>>> though. For most processing engines, it is people joining the Apache >>>> Iceberg community. No matter what the license of the downstream project, we >>>> always welcome

[DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

2022-02-20 Thread OpenInx
Hi everyone The current spark2.4, spark3.0 have the following unaligned runtime artifact names: # Spark 2.4 iceberg-spark-runtime-0.13.1.jar # Spark 3.0 iceberg-spark3-runtime-0.13.1.jar # Spark 3.1 iceberg-spark-runtime-3.1_2.12-0.13.1.jar # Spark 3.2 iceberg-spark-runtime-3.2_2.12-0.13.1.jar >

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

2022-02-24 Thread OpenInx
t;>> convention. >>>> >>>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye wrote: >>>> >>>>> I think option 2 is ideal, but I don't know if there is any hard >>>>> requirement from ASF/Maven Central side for us to keep backwards >

Re: Review request

2022-03-02 Thread OpenInx
Thanks Peter for the great work. Just added my comments. On Wed, Mar 2, 2022 at 4:20 PM Peter Vary wrote: > Hi Team, > > I have a PR (https://github.com/apache/iceberg/pull/4218) waiting for > review where with basically a 1 liner change we can improve the performance > of the GenericReader clas

[DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-03 Thread OpenInx
Hi Iceberg dev As we all know, in our current apache iceberg write path, the ORC file writer cannot just roll over to a new file once its byte size reaches the expected threshold. The core reason that we don't support this before is: The lack of correct approach to estimate the byte size from

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-03 Thread OpenInx
te: > Thanks to openinx for opening this discussion. > > One thing to note, the current approach faces a problem, because of some > optimization mechanisms, when writing a large amount of duplicate data, > there will be some deviation between the estimated and the actual size. &g

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-07 Thread OpenInx
gt;> >> ORC-1123 Add `estimationMemory` method for writer >> >> According to the Apache ORC milestone, it will be released on May 15th. >> >> https://github.com/apache/orc/milestones >> >> Bests, >> Dongjoon. >> >> On 2022/03/0

Review Request

2022-03-09 Thread OpenInx
Hi iceberg dev I've recently revisited the flink write path to use the newly introduced writers (which is partition specific writers). All the future performance & stability optimization will be made on top of the revisited flink write path. I've just published the PR here: https://github.com/ap

Re: Welcome Szehon Ho as a committer!

2022-03-11 Thread OpenInx
Congrats Szehon! On Sat, Mar 12, 2022 at 7:55 AM Steve Zhang wrote: > Congratulations Szehon, Well done! > > Thanks, > Steve Zhang > > > > On Mar 11, 2022, at 3:51 PM, Jack Ye wrote: > > Congratulations Szehon!! > > -Jack > > On Fri, Mar 11, 2022 at 3:45 PM Wing Yew Poon > wrote: > >> Congratu

Re: Iceberg Delete Compaction Interface Design

2022-04-20 Thread OpenInx
Hi Yufei There was a proposed PR for this : https://github.com/apache/iceberg/pull/4522 On Thu, Apr 21, 2022 at 5:42 AM Yufei Gu wrote: > Hi team, > > Do we have a PR for this type of delete compaction? > >> Merge: the changes specified in delete files are applied to data files >> and then over

Re: 【Feature】Request support for c++ sdk

2022-06-08 Thread OpenInx
As a cloud-native table format standard for the big-data ecosystem, I believe supporting multiple languages is the correct direction so that different languages can connect to the apache iceberg table format. But I can also get Kyle's point about lacking enough resources(developers and reviewers

Re: 【Feature】Request support for c++ sdk

2022-06-12 Thread OpenInx
of JVM-based > query engines out there taking charge of data maintenance. We don't have to > rewrite every corner of Iceberg in Rust. That means less engineering work. > > On 2022/06/08 10:16:05 OpenInx wrote: > > As a cloud-native table format standard for the big-data eco

Re: 【Feature】Request support for c++ sdk

2022-06-12 Thread OpenInx
intain it etc. > > But in general I think this is an exciting opportunity, and results have > shown time and time again that native readers / writers are much more > performant. > > +1 to using Rust as well (which is a language I know more of than C++ > these days - though bo

Re: [VOTE] Release Apache Iceberg 0.14.1 RC3

2022-09-05 Thread OpenInx
+1 (binding). 1. Download the source tarball, signature (.asc), and checksum (.sha512): OK 2. Import gpg keys: download KEYS and run (optional if this hasn’t changed) ```bash $ gpg --import /path/to/downloaded/KEYS ``` It's OK 3. Verify the signature by running: ```bash $ gpg --verify apach

Re: [Discuss]- Donate Iceberg Flink Connector

2022-11-08 Thread OpenInx
Hi Sorry for the late reply. I'm one of the core flink iceberg connector maintainers at the early stage (flink 1.12, flink 1.13, flink 1.14). For the later flink releases, I've had some adjustments in my work and had less interactions with apache flink+iceberg, thanks Ryan, Steven, Kyle, hililiwe

Re: [VOTE] Release Apache Iceberg 1.1.0 RC4

2022-11-27 Thread OpenInx
+1 (binding) 1. Download the source tarball, signature (.asc), and checksum (.sha512): OK 2. Import gpg keys: download KEYS and run gpg --import /path/to/downloaded/KEYS.txt (optional if this hasn’t changed) : OK 3. Verify the signature by running: gpg --verify apache-iceberg-1.1.0.tar.gz.asc :

Re: In Remembrance of Kyle

2022-12-07 Thread OpenInx
So sad to get this news...I lost such a great, kind, passionate friend. On Thu, Dec 8, 2022 at 1:36 AM Ryan Blue wrote: > I'm going to miss Kyle and I'm sad to lose him. > > He was amazing at making everyone feel welcome here. I think he commented > on nearly every pull request for the last few

<    1   2