Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-18 Thread Edgar Rodriguez
I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to Spark 3.x, if not already moved. As for the Hadoop upgrade, I think that could be problematic for us if there's any non-backwards compatible API change required at compile time since we're still running a 2.8.x version. Cheers,

Re: [DISCUSS] Spark 3.1 support?

2023-04-21 Thread Edgar Rodriguez
Airbnb is also still on Spark 3.1 and I echo some of Walaa's comments. Cheers, On Thu, Apr 20, 2023 at 8:14 PM Walaa Eldin Moustafa wrote: > LinkedIn is still on Spark 3.1. I am guessing a number of other companies > could be in the same boat. I feel the argument for Spark 2.4 is different > fr

Re: [DISCUSS] Spark 3.1 support?

2023-04-24 Thread Edgar Rodriguez
Hi all, Thanks for the discussion. Similarly to Manu, we're in Spark 3.1.1 and Iceberg 1.1.0 - we backport Spark 3.1.1 fixes internally as well. It's a bit more complicated to move fast on Spark versions internally, mainly due to the number of scala customers that we have. I understand maintainin

Re: [DISCUSS] Write-audit-publish support

2019-07-22 Thread Edgar Rodriguez
hould not update the table’s >>>>> current >>>>> stage. That happens when there is a Spark property, spark.wap.id, >>>>> that indicates the job is a WAP job. Then any table that has WAP enabled >>>>> by >>>>> the table property write.wap.enabled=true will stage the new snapshot >>>>> instead of fully committing, with the WAP ID in the snapshot’s metadata. >>>>> >>>>> Is this something we should open a PR to add to Iceberg? It seems a >>>>> little strange to make it appear that a commit has succeeded, but not >>>>> actually change a table, which is why we didn’t submit it before now. >>>>> >>>>> Thanks, >>>>> >>>>> rb >>>>> -- >>>>> Ryan Blue >>>>> Software Engineer >>>>> Netflix >>>>> >>>> >>>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Filip Bocse >> > -- Edgar Rodriguez

Iceberg in Spark 3.0.0

2019-08-07 Thread Edgar Rodriguez
seems interesting to start tracking those changes to start evaluating some of the support as it evolves. Thanks. Cheers, -- Edgar Rodriguez

Re: Iceberg in Spark 3.0.0

2019-08-08 Thread Edgar Rodriguez
test with? > Seems like there're nightly snapshots built in https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ - I've started setting something up with these snapshots so I can probably start working on this. Thanks! Cheers, -- Edgar Rodriguez

Re: Are we going to use Apache JIRA instead of Github issues

2019-08-19 Thread Edgar Rodriguez
A? >>> >>> On Fri, Aug 16, 2019 at 3:09 AM Saisai Shao >>> wrote: >>> >>>> Hi Team, >>>> >>>> Seems Iceberg project uses Github issues instead of JIRA. IMHO JIRA is >>>> more powerful and easy to manage, most of the Apache projects use JIRA to >>>> track everything, any plan to move to JIRA or we stick on using Github >>>> issues? >>>> >>>> Thanks >>>> Saisai >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> -- Edgar Rodriguez

Re: Spark version

2019-08-22 Thread Edgar Rodriguez
do the work once and migrate to Spark 2.4.3 >> instead of Spark 2.4.0. >> >> -Best, >> >> >> > > -- > Ryan Blue > Software Engineer > Netflix > -- Edgar Rodriguez

Re: New committer and PPMC member, Anton Okolnychyi

2019-08-30 Thread Edgar Rodriguez
t invited to join >> the Iceberg committers and PPMC! >> >> Thanks for all your contributions, Anton! >> >> rb >> >> -- >> Ryan Blue >> > -- Edgar Rodriguez

Re: Shall we start a regular community sync up?

2020-06-15 Thread Edgar Rodriguez
Hi Ryan, I'd like to attend the regular community syncs, could you send me an invite? Thanks! - Edgar On Wed, Mar 25, 2020 at 6:38 PM Ryan Blue wrote: > Will do. > > On Wed, Mar 25, 2020 at 6:36 PM Jun Ma wrote: > >> Hi Ryan, >> >> Thanks for driving the sync up meeting. Could you please add

Re: S3 example in Java

2020-06-24 Thread Edgar Rodriguez
Hi Chen, Since S3 does not have atomic rename operation, for create/write/read of tables in S3 currently the only way to do it is via the HiveCatalog implementation which requires a Hive metastore with Lock support to provide the atomic commit required by Iceberg. You can alternatively write your

Re: failing tests on master

2020-06-26 Thread Edgar Rodriguez
There's already a fix for this in https://github.com/apache/iceberg/pull/1127 Cheers, On Fri, Jun 26, 2020 at 5:26 AM Mass Dosage wrote: > Hello all, > > For the past week or so I've noticed failing builds on a local checkout of > master. > > I have raised an issue here: > > https://github.com/

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-13 Thread Edgar Rodriguez
+1 (non-binding) - Verified signatures - Verified checksum - Build from src tarball and ran tests - Ran internal test suite, they pass On Mon, Jul 13, 2020 at 11:46 AM Pavan Lanka wrote: > +1 (non-binding) > > >- Environment > - OSX > - openjdk 1.8.0_252 >- Build from source

Re: Hive Iceberg writes

2020-08-27 Thread Edgar Rodriguez
Hi folks, We have not started to work on this either, but we've discussed this internally on whether supporting Hive writes or not. Our first priority right now is getting Hive reads in production to have read compatibility with our existing Hive clients. We'd be interested in this, however, at Ai

Re: [VOTE] Release Apache Iceberg 0.10.0 RC4

2020-11-05 Thread Edgar Rodriguez
+1 non-binding for RC4. Tested with internal tests in cluster, validated Spark write and Hive reads. On Thu, Nov 5, 2020 at 5:56 AM Mass Dosage wrote: > +1 non-binding on RC4. I tested out the Hive read path on a distributed > cluster using HadoopTables. > > On Thu, 5 Nov 2020 at 04:46, Dongjoon

Re: About importing Hive tables and name mapping

2020-11-05 Thread Edgar Rodriguez
Hi Xiang, On Thu, Nov 5, 2020 at 11:07 AM 李响 wrote: > Dear community: > > I am using SparkTableUtil to import an existing Hive table to an Iceberg > table. > The ORC files of Hive table is an old version of ORC, so I set a name > mapping (like: id 1 mapped to _col0 and id 2 mapped to _col1...) t

Re: Default TimeZone for unit tests

2021-03-01 Thread Edgar Rodriguez
Hi folks, Thanks Peter for the quick fix! I do think it'd be a good idea to have this kind of coverage to some extent. Usually, a workflow some users follow is to only run locally the modules that they modify and rely on the CI to run the full check which takes longer, which makes room for these

Hive query with join of Iceberg table and Hive table

2021-03-02 Thread Edgar Rodriguez
Hi, I'm trying to run a simple query in Hive 2.3.4 with a join of a Hive table and an Iceberg table, each configured accordingly - Iceberg table has the `storage_handler` defined and running with MR engine. I'm using the `iceberg.mr.catalog.loader.class` class to load our internal catalog. In the

Re: Hive query with join of Iceberg table and Hive table

2021-03-02 Thread Edgar Rodriguez
ead of > HiveCatalog > > On Mar 2, 2021, at 18:49, Edgar Rodriguez < > edgar.rodrig...@airbnb.com.INVALID> wrote: > > Hi, > > I'm trying to run a simple query in Hive 2.3.4 with a join of a Hive table > and an Iceberg table, each configured accordingly - Iceberg

Re: Hive query with join of Iceberg table and Hive table

2021-03-03 Thread Edgar Rodriguez
On Wed, Mar 3, 2021 at 1:48 AM Peter Vary wrote: > Quick question @Edgar: Am I right that the table is created by Spark? I > think if it is created from Hive and we inserted the data from Hive, then > we should have the basic stats already collected and we should not need the > estimation (we mig

Migrating legacy snapshot daily Hive table concept to Iceberg

2021-03-08 Thread Edgar Rodriguez
Hi folks, I’d like to request some feedback on how to use Iceberg to approach a use case we have, that I believe some other folks could be facing, since this was a pattern usually followed with Hive tables. Use case: 1. We used to have database table snapshots exported daily at 0 UTC. Each day a

Re: Hive query with join of Iceberg table and Hive table

2021-03-12 Thread Edgar Rodriguez
Hive server endpoint then we probably have more of a concern about >> memory consumption. >> >> Vivekanand, can you share more detail about how/where this is happening >> in your case? >> >> On Wed, Mar 3, 2021 at 7:53 AM Edgar Rodriguez < >> edgar.rodri

Re: Migrating legacy snapshot daily Hive table concept to Iceberg

2021-03-15 Thread Edgar Rodriguez
Hi Ryan, On Tue, Mar 9, 2021 at 5:54 AM Ryan Murray wrote: > Hey Edgar, Cheng Pan, > > I am not sure if you are aware of project nessie > ? It _may_ suit your needs. Nessie applies > git-like functionality to iceberg tables (in this case most useful are > branches and

Re: Nessie PRs

2021-03-15 Thread Edgar Rodriguez
FYI I think https://github.com/apache/iceberg/pull/2307 broke the CI, I'm seeing the following errors: FAILURE: Build failed with an exception. 29 30

Re: Nessie PRs

2021-03-15 Thread Edgar Rodriguez
ber of artifacts that looked unrelated to the change that >> couldn't be resolved and assumed it was a temporary issue. >> >> I'll see if I can kick it off again (or Ryan M. if you have a chance to >> look at the error, that would help). >> >> On Mon, Mar

Re: Welcoming Russell Spitzer as a new committer

2021-03-29 Thread Edgar Rodriguez
Congrats, Russell! Cheers, On Mon, Mar 29, 2021 at 11:01 PM Robin Stephen wrote: > Congratulations, Russell! > > Jack Ye 于2021年3月30日周二 上午10:39写道: > >> Congratulations Russell! >> >> On Mon, Mar 29, 2021 at 7:25 PM OpenInx wrote: >> >>> Congrats, Russell ! Well-deserved ! >>> >>> On Tue, Mar

Re: Welcoming Ryan Murray as a new committer!

2021-03-29 Thread Edgar Rodriguez
Congratulations, Ryan! Best, On Mon, Mar 29, 2021 at 10:39 PM Jack Ye wrote: > Congratulations Ryan! > > On Mon, Mar 29, 2021 at 7:25 PM OpenInx wrote: > >> Congrats, Ryan ! Well-deserved ! >> >> On Tue, Mar 30, 2021 at 9:32 AM Junjie Chen >> wrote: >> >>> Congratulations. Ryan! >>> >>> On T

Re: [VOTE] Release Apache Iceberg 0.11.1 RC0

2021-03-30 Thread Edgar Rodriguez
+1 (non-binding) - Verified build, signature and checkum. - Ran internal integration tests. Cheers, On Tue, Mar 30, 2021 at 7:50 AM Ryan Murray wrote: > +1 (non-binding) > > verified build, tests, signature, checksum. > > Best, > Ryan > > On Tue, Mar 30, 2021 at 4:40 AM Jack Ye wrote: > >> +1

Optimize Equality Deletes with Sorting

2025-04-01 Thread Edgar Rodriguez
Hi all, I know there's been some conversations regarding optimization of equality deletes and even their possible deprecation. We have been thinking internally about a way to optimize merge-on-read with equality deletes to better balance the read performance while having the benefits of performant