Re: [DISCUSS] Community code reviews

2019-02-26 Thread RD
+1 On Tue, Feb 26, 2019 at 5:49 PM Jacques Nadeau wrote: > I'm +1 (non-binding) if you allow a window for review (for example, I > think others have suggested 1-2 business day before self+1). The post, > self +1, merge in two minutes is not great situation for anyone. > -- > Jacques Nadeau > CTO

Re: [VOTE] Community code reviews

2019-02-27 Thread RD
+1 On Wed, Feb 27, 2019 at 1:40 PM Matt Cheah wrote: > +1 > > > > *From: *Ryan Blue > *Reply-To: *"dev@iceberg.apache.org" , " > rb...@netflix.com" > *Date: *Wednesday, February 27, 2019 at 1:36 PM > *To: *Iceberg Dev List > *Subject: *Re: [VOTE] Community code reviews > > > > +1 > > > > On W

Re: [VOTE] Add the python implementation

2019-03-05 Thread RD
+1 On Tue, Mar 5, 2019 at 5:01 PM John Zhuge wrote: > +1 > > On Tue, Mar 5, 2019 at 4:59 PM Xabriel Collazo Mojica > wrote: > >> +1 >> >> >> >> *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | >> xcoll...@adobe.com >> >> >> >> *From: *Ted Gooch >> *Reply-To: *"dev@iceberg.a

Re: Spark SQL error - Unsupported data source type for direct query on files: iceberg

2019-03-30 Thread RD
Hi Sandeep, One way would be to register your dataframe, you loaded, as a temporary view. See https://spark.apache.org/docs/latest/sql-getting-started.html#global-temporary-view -R On Fri, Mar 29, 2019 at 5:12 PM Sandeep Sagar wrote: > Hi, > > Exploring Iceberg here. Trying a hello-world in j

Re: Style guidelines proposal for Iceberg

2019-04-10 Thread RD
+1 . I can help out too. -R On Wed, Apr 10, 2019 at 9:35 AM Matt Cheah wrote: > I opened a list of Github Issues that all have *[Baseline]* in the issue > description. We can use that to keep track of the modules we have yet to > apply Baseline to, as well as have contributors take ownership of

Reader schema does not project union but data has non-optional unions

2019-05-13 Thread RD
Iceberg today does not support non optional unions and that is the right behaviour, but we do have a lot of datasets which have non-optional union fields. I'm wondering whether Iceberg should allow reading these datasets as long as the user does not project the union field. I tried it out and toda

Re: Reader schema does not project union but data has non-optional unions

2019-05-13 Thread RD
To add to this, I'm not suggesting to change Iceberg writers to support writing non-optional unions. The motivation for this is to support legacy datasets [not written by Iceberg]. On Mon, May 13, 2019 at 10:05 AM RD wrote: > Iceberg today does not support non optional unions and tha

Re: Reader schema does not project union but data has non-optional unions

2019-05-15 Thread RD
nals. > > I think this makes sense. Maybe we could also allow projecting the > contents of unions by representing them as structs of optionals and > materializing them that way. I'd be up for reviewing this. > > On Mon, May 13, 2019 at 1:48 PM RD wrote: > >> To add to

Re: Vanilla Spark Readers on Iceberg written data..

2019-05-15 Thread RD
Is backporting relevant datasource patches to Spark 2.3 a non starter? If this were doable I believe this is much simpler than bypassing Iceberg metadata to read files directly. -R On Wed, May 15, 2019 at 3:02 PM Gautam wrote: > Just wanted to add, from what I have tested so far I see this work

Understanding Iceberg's dependency configuration

2019-06-22 Thread RD
Hi Iceberg devs, I see that guava and slf4j-api are compileOnly dependencies. This implies that they are not required at runtime and will not be resolved when resolving Iceberg artifacts. So it might very well be the case that, say for example, for iceberg-spark, the guava dependency that could be

Re: Getting delta of data changes between 2 Snapshots

2019-07-17 Thread RD
Hi Iceberg devs, We are starting work on a somewhat similar project. The idea is that users can ask for incremental data since the last snapshot they processed, i.e the delta that was added since the last snapshot. Do you guys think that whether this can be a general feature that can we benefici

Re: [DISCUSS] Write-audit-publish support

2019-07-19 Thread RD
I think this could be useful. When we ingest data from Kafka, we do a predefined set of checks on the data. We can potentially utilize something like this to check for sanity before publishing. How is the auditing process suppose to find the new snapshot , since it is not accessible from the table

Re: [DISCUSS] Implementation strategies for supporting Iceberg tables in Hive

2019-07-25 Thread RD
Hi Adrien, We at LinkedIn went through a similar thought process, but given our Hive deployment is not that large, we are in the process of considering deprecating Hive and asking our users to move to Spark [since Spark supports Hive ql]. I'm guessing we'd have to invest in Spark's catalog AFAI

Re: Getting delta of data changes between 2 Snapshots

2019-07-25 Thread RD
would also be nice if such a system had a way to block > snapshot expiration until all downstream incremental processes have > completed. > > rb > > On Wed, Jul 17, 2019 at 12:46 PM RD wrote: > >> Hi Iceberg devs, >>We are starting work on a somewhat similar

Re: Getting delta of data changes between 2 Snapshots

2019-07-29 Thread RD
s interface? > Should those be ignored? (Maybe the answer to this question is a good > reason to have a separate IncrementalScan API.) > > On Thu, Jul 25, 2019 at 4:03 PM RD wrote: > >> Thanks Ryan, >> >> Iceberg can give you the data files that were added or delete

Spark version

2019-08-22 Thread RD
Hi Iceberg devs, We are in process of upgrading our Spark version to support Iceberg. We wanted to know if Iceberg, would in the near term move to Spark 2.4.3? If that's the case, we will do the work once and migrate to Spark 2.4.3 instead of Spark 2.4.0. -Best,

Re: New committer and PPMC member, Anton Okolnychyi

2019-09-06 Thread RD
Congratulations Anton! Regards On Tue, Sep 3, 2019 at 9:03 AM Xabriel Collazo Mojica wrote: > Congrats Anton! > > > > *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | > xcoll...@adobe.com > > > > *From: *Anjali Norwood > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Tuesd

Re: [VOTE] Release Apache Iceberg 0.7.0-incubating RC1

2019-10-12 Thread RD
Successfully executed steps 1 - 7. +1 Release this as Apache Parquet 0.7.0-incubating -R On Sat, Oct 12, 2019 at 8:00 AM Arina Yelchiyeva wrote: > Not sure, if this is related to the release vote but after "Update build > for Apache releases" commit [1], we are not longer able to build Iceberg

Re: [VOTE] Release Apache Iceberg 0.7.0-incubating RC1

2019-10-12 Thread RD
+1 On Sat, Oct 12, 2019 at 9:11 PM RD wrote: > Successfully executed steps 1 - 7. > > +1 Release this as Apache Parquet 0.7.0-incubating > > -R > > On Sat, Oct 12, 2019 at 8:00 AM Arina Yelchiyeva < > arina.yelchiy...@gmail.com> wrote: > >> Not sure, if

Re: [VOTE] Release Apache Iceberg 0.7.0-incubating RC2

2019-10-17 Thread RD
+1 Validated all steps from previous email. -R On Thu, Oct 17, 2019 at 10:42 AM Julien Le Dem wrote: > +1 > I validated the signatures, checked the licences, ran the build > > (typo in the vote options. It's "+1 Release this as Apache *Iceberg* > 0.7.0-incubating") > > On Tue, Oct 15, 2019 at 9

Re: [VOTE] Release Apache Iceberg 0.7.0-incubating RC4

2019-10-22 Thread RD
All checks passed. Verified the spark runtime with a Spark job. -R On Mon, Oct 21, 2019 at 10:42 PM parth brahmbhatt < brahmbhatt.pa...@gmail.com> wrote: > +1(binding) > > All checks passed and presto smoke tests pass as well. > > > On Mon, Oct 21, 2019 at 3:13 PM Daniel Weeks wrote: > >> +1 >>

Re: Welcome new committer and PPMC member Ratandeep Ratti

2020-02-16 Thread RD
Thanks everyone! -Best, R. On Sun, Feb 16, 2020 at 7:39 PM David Christle wrote: > Congrats!!! > > > > *From: *Jacques Nadeau > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Sunday, February 16, 2020 at 7:20 PM > *To: *Iceberg Dev List > *Subject: *Re: Welcome new committer and PPMC member

Re: Welcome new committer and PPMC member Ratandeep Ratti

2020-02-18 Thread RD
Thank you again, everyone! -R On Mon, Feb 17, 2020 at 10:06 PM Gautam wrote: > Congratulations and thanks for your work. > > On Sun, Feb 16, 2020 at 8:37 PM RD wrote: > >> Thanks everyone! >> >> -Best, >> R. >> >> On Sun, Feb 16, 2020 at 7

Re: [Discuss] Merge spark-3 branch into master

2020-03-07 Thread RD
I'm +1 to separate modules for spark-2 and spark-3, after the 0.8 release. I think it would be a big change in organizations to adopt Spark-3 since that brings in Scala-2.12 which is binary incompatible to previous Scala versions. Hence this adoption could take a lot of time. I know in our company

Re: Shall we start a regular community sync up?

2020-03-18 Thread RD
+1 On Wed, Mar 18, 2020 at 10:49 AM Ryan Blue wrote: > No problem, we can alternate times to include everyone. How about the next > sync at 5 PM UTC+7 and then the one after that at a time that works for > people in UTC+0/+1? > > On Wed, Mar 18, 2020 at 10:21 AM Mass Dosage wrote: > >> We're in

Re: Shall we start a regular community sync up?

2020-03-19 Thread RD
Same time works for me too! On Thu, Mar 19, 2020 at 4:45 PM Xabriel Collazo Mojica wrote: > 5pm or 5:30pm PT any day next week would work for me. > > Thanks for restoring the community sync up! > > Xabriel J Collazo Mojica | Sr Computer Scientist II | Adobe > > On 3/18/20, 6:45 PM, "justin

Re: Iceberg community sync notes - 15 April 2020

2020-04-17 Thread RD
Thanks for the Correction Adrian. I've filed the ticket for github here: https://github.com/apache/incubator-iceberg/issues/934 . There are 2 approaches mentioned there with pros/cons. Will be good to get the community's feedback on how to proceed. -best, R. On Fri, Apr 17, 2020 at 6:28 AM Mass

Re: [VOTE] Release Apache Iceberg 0.8.0-incubating RC1

2020-04-29 Thread RD
+1 . Verified the outlined steps. -R On Wed, Apr 29, 2020 at 8:09 AM tison wrote: > Hi Junjie, > > It is kind of GPG scope issue that you should manually trust Ryan's KEYS. > > FYI https://www.gnupg.org/gph/en/manual/x334.html > > Best, > tison. > > > Junjie Chen 于2020年4月29日周三 下午9:47写道: > >> R

Re: [VOTE] Release Apache Iceberg 0.8.0-incubating RC2

2020-05-01 Thread RD
+1 Validated all the steps mentioned. -R On Fri, May 1, 2020 at 9:31 AM Ryan Blue wrote: > +1 (binding) > > Ran rat, validated checksums and signature, and ran the build. > > I noticed that the iceberg-spark-runtime Jar is about 22MB larger and it > looks like the problem is mainly that parquet

Re: [VOTE] Graduate to a top-level project

2020-05-12 Thread RD
+1 for graduation! On Tue, May 12, 2020 at 3:50 PM John Zhuge wrote: > +1 > > On Tue, May 12, 2020 at 3:33 PM parth brahmbhatt < > brahmbhatt.pa...@gmail.com> wrote: > >> +1 >> >> On Tue, May 12, 2020 at 3:31 PM Anton Okolnychyi >> wrote: >> >>> +1 for graduation >>> >>> On 12 May 2020, at 15:3

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-13 Thread RD
+1 - verified signatures and checksum - Ran RAT checks - Build src and ran all tests - Ran a simple spark job. -Best, R. On Mon, Jul 13, 2020 at 8:36 AM Junjie Chen wrote: > I ran the following steps: >- downloaded and verified signature and checksum. >- ran ./gradlew build, it took

Re: New committer: Shardul Mahadik

2020-07-22 Thread RD
Congratulations Shardul! Well deserved! -Best, R. On Wed, Jul 22, 2020 at 2:24 PM Ryan Blue wrote: > Hi everyone, > > I'd like to congratulate Shardul Mahadik, who was just invited to join the > Iceberg committers! > > Thanks for all your contributions, Shardul! > > rb > > > -- > Ryan Blue >

Re: [VOTE] Release Apache Iceberg 0.9.1 RC0

2020-08-13 Thread RD
+1 (Binding) Validated signature, checksum and RAT. Ran all tests. -Best, R. On Thu, Aug 13, 2020 at 11:48 AM Dongjoon Hyun wrote: > +1 (non-binding). > > Thank you! > > Bests, > Dongjoon. > > On Thu, Aug 13, 2020 at 7:46 AM John Zhuge wrote: > >> +1 (non-binding) >> >> Verified signature, che

Re: [DISCUSS] Rename iceberg-hive module?

2020-08-19 Thread RD
I'm +1 for this rename. I think we should keep the iceberg-mr module as is and maybe add a new module iceberg-hive-exec [not sure if it is a good idea to salvage iceberg-hive for this purpose] which contains hive specific StorageHandler, Serde and IcebergHivInputFormat classes. -R On Wed, Aug 19

Re: Hive Iceberg writes

2020-08-27 Thread RD
Our stance has been similar at LinkedIn. Hive writes are not a priority for us as we plan to move more and more of our workloads on Hive to Spark SQL -R On Thu, Aug 27, 2020 at 10:18 AM Edgar Rodriguez wrote: > Hi folks, > > We have not started to work on this either, but we've discussed this >

Re: [VOTE] Release Apache Iceberg 0.10.0 RC5

2020-11-09 Thread RD
+1 Validated signature, checksum and RAT checks. Build with all tests passed. On Mon, Nov 9, 2020 at 4:13 AM Ryan Murray wrote: > +1 (non-binding) > > all normal tests pass and tests against nessie also look good. > > Best, > > Ryan > > > On Mon, Nov 9, 2020 at 12:03 PM Mass Dosage wrote: > >>

Re: Welcoming OpenInx as a new PMC member!

2021-06-30 Thread RD
Congrats OpenInx(Zheng Hu)! On Tue, Jun 29, 2021 at 11:41 PM Gautam wrote: > Congratulations Zheng Hu! > > On Tue, Jun 29, 2021 at 8:17 PM OpenInx wrote: > >> Thanks all ! >> >> I really appreciate the trust from the Apache iceberg community. For me, >> this is not only an honor, but also a re