Re: Apache Impala official Iceberg support

2022-06-21 Thread Peter Vary
Well done Impala Team! > On 2022. Jun 17., at 15:39, Ajantha Bhat wrote: > > Nice. Happy to see more engines adopting the Iceberg 👍🏻 > > On Fri, Jun 17, 2022 at 5:25 PM Tamás Máté > wrote: > Dear Apache Iceberg team, > > The Apache Impala team is pleased to announce

Re: [VOTE] Release Apache Iceberg 0.13.2 RC1

2022-06-07 Thread Peter Vary
+1 (I am a committer but not a PMC, so I am not sure if this is binding, or not :D) Verified the code, sums and build. I was able to run the hive3 tests without issues. Thanks, Peter > On 2022. Jun 7., at 4:55, John Zhuge wrote: > > +1 (non-binding) > > Verified sigs, sums, license, build an

Positional delete with vs without the delete row values

2022-05-05 Thread Peter Vary
Hi Team, We are working on integrating Iceberg V2 tables with Hive, and enabling delete and update operations. The delete is implemented by Marton and the first version is already merged: https://issues.apache.org/jira/browse/HIVE-26102 The upd

Re: Hive 4.0.0-alpha-1 release is available with Iceberg integration

2022-04-07 Thread Peter Vary
The congratulation is due to the Iceberg and the Hive community! Thanks everyone and keep up the good work! > On 2022. Apr 7., at 19:22, Wing Yew Poon wrote: > > Congratulations on the release, making available this functionality for Hive > users! > > > On Thu, Apr 7,

Hive 4.0.0-alpha-1 release is available with Iceberg integration

2022-04-07 Thread Peter Vary
Hi Team, I would like to let you know that the Hive team released Hive 4.0.0-alpha-1. Using this release it is possible to create, read, write Iceberg V1 tables with Hive. There are some rough edges there but most of the queries, functions should be working. Just some examples: CREATE EXTERNA

Re: Welcome Szehon Ho as a committer!

2022-03-16 Thread Peter Vary
Congratulations! > On 2022. Mar 14., at 9:16, Omar Al-Safi wrote: > > Congratulations Szehon! > > On Mon, Mar 14, 2022 at 6:42 AM Eduard Tudenhoefner > wrote: > Congratulations! > > On Sun, Mar 13, 2022, 20:00 Daniel Weeks > wrote: > Congra

Review request

2022-03-02 Thread Peter Vary
Hi Team, I have a PR (https://github.com/apache/iceberg/pull/4218 ) waiting for review where with basically a 1 liner change we can improve the performance of the GenericReader classes by 10–20%. This onliner is needed in 3 places. The other part of

Re: New Versioned Iceberg Documentation Site

2022-02-07 Thread Peter Vary
Nice site Sam! Thanks for the good work! Shall we include Hive in the main page? Thanks, Peter On Mon, 7 Feb 2022 at 08:58, Eduard Tudenhoefner wrote: > I agree that design docs and such are currently difficult to find. I just > wanted to point out that some are linked in > https://github.com/

Re: Welcome new PMC members!

2021-11-17 Thread Peter Vary
Congratulations Jack and Russell! On Thu, 18 Nov 2021, 05:59 Gidon Gershinsky, wrote: > Congratulations guys!! > > Cheers, Gidon > > > On Thu, Nov 18, 2021 at 2:12 AM Ryan Blue wrote: > >> Hi everyone, I want to welcome Jack Ye and Russell Spitzer to the Iceberg >> PMC. They've both been amazin

Re: Iceberg 0.12.1 Patch Release - Call for Bug Fixes and Patches

2021-10-21 Thread Peter Vary
sistent with Catalogs.hiveCatalog, and fixing create table issues when no catalog is set in the config > On 2021. Oct 21., at 16:59, Peter Vary wrote: > > I would like to have this in 0.12.1: > https://github.com/apache/iceberg/pull/3338 > <https://github.com/apache/iceberg/pull/3338&

Re: Iceberg 0.12.1 Patch Release - Call for Bug Fixes and Patches

2021-10-21 Thread Peter Vary
I would like to have this in 0.12.1: https://github.com/apache/iceberg/pull/3338 This breaks Hive queries, if no catalog is set, but this still needs to be reviewed before merge. Thanks, Peter On Thu, 21 Oct 2021, 07:12 Rajarshi Sarkar, wrote: > Hope this can get in: https://github.com/apache

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Peter Vary
Since you mentioned Hive, I chime in with what we do there. You might find it useful: - metastore module - only small differences - DynConstructor solves for us - mr module - some bigger differences, but still manageable for Hive 2-3. Need some new classes, but most of the code is reused - extra mo

Re: Iceberg disaster recovery and relative path sync-up

2021-08-22 Thread Peter Vary
relative paths, but I wanted >>>> to ask an orthogonal but related question. >>>> Do we see disaster recovery as the only (or main) use case for >>>> multi-region? >>>> Is data residency requirement a use case for anybody? Is it possible to

Re: [ANNOUNCE] Apache Iceberg release 0.12.0

2021-08-20 Thread Peter Vary
Well done everyone! Thanks Carl for shepherding the process! On Fri, 20 Aug 2021, 01:08 Jack Ye, wrote: > Congrats! Thanks for managing the release Carl! > -Jack Ye > > On Thu, Aug 19, 2021 at 3:20 PM Carl Steinbach wrote: > >> I'm pleased to announce the release of Apache Iceberg 0.12.0! >> >

Re: Iceberg disaster recovery and relative path sync-up

2021-08-20 Thread Peter Vary
Sadly, I have missed the meeting :( Quick question: Was table rename / location change discussed for tables with relative paths? AFAIK when a table rename happens then we do not move old data / metadata files, we just change the root location of the new data / metadata files. If I am correct abou

Re: Orc deletes

2021-08-04 Thread Peter Vary
May I get review for this first step: https://github.com/apache/iceberg/pull/2935 <https://github.com/apache/iceberg/pull/2935> Thanks, Peter > On Aug 2, 2021, at 12:42, Peter Vary wrote: > > Created https://github.com/apache/iceberg/issues/2914 > <https://github.com/

Documentation for old versions of Iceberg

2021-08-04 Thread Peter Vary
Hi Team, Some users has problems with the Hive integration. I have noticed that the official site now contains the new documentation for catalog handling, but that is not yet available for the users as we are in the process of the release. Is there a way to access the old version of the document

Re: Orc deletes

2021-08-02 Thread Peter Vary
at we know is missing or needs to be done? >> >> I’d be happy to help with any tasks I can as it seems important to keep up >> compatibility. >> >> - Kyle Bendickson, @kbendick >> >>> On Jul 25, 2021, at 9:59 PM, Peter Vary wrote: >>> >

Re: [DISCUSS] UUID type

2021-07-27 Thread Peter Vary
Hi Joshua, I do not have a strong preference about the UUID type, but I would like the highlight, that the type is handled inconsistently in Iceberg with different file formats. (See: https://github.com/apache/iceberg/issues/1881 ) If we keep the type, it would be good to standardize the handling

Orc deletes

2021-07-25 Thread Peter Vary
Hi Team, I see plenty of ongoing work for Parquet and Avro deletes. Is there a particular reason why the ORC deletes are left out? We just miss the resources to work on it as well, or there are some known roadblocks we have already identified and have to remove them from the ORC side first? Thank

Re: Reading metadata tables

2021-07-19 Thread Peter Vary
rds using Iceberg's Record class and the > internal representations. I believe that's what the existing Iceberg object > inspectors use. Couldn't you just wrap this with an IcebergWritable and use > the regular object inspectors? > > On Thu, Jul 15, 2021 at 8:53 A

Re: Reading metadata tables

2021-07-15 Thread Peter Vary
ly for int/long/string etc types and it has problems with Long->OffsetDateTime conversion and friends. I am almost sure that this should have an existing and better solution already somewhere :) > On Jul 15, 2021, at 15:57, Peter Vary wrote: > > Hi Team, > > I am working t

Reading metadata tables

2021-07-15 Thread Peter Vary
Hi Team, I am working to enable running queries above metadata tables through Hive. I was able to load the correct metadata table though the Catalogs, and I created the TableScan, but I am stuck there ATM. What is the recommended way to get the Record-s for the Schema defined by the MetadataTab

Re: Welcoming OpenInx as a new PMC member!

2021-07-12 Thread Peter Vary
Just reading now. Congratulations Zheng! > On Jun 30, 2021, at 04:06, Robin Stephen wrote: > > Congratulations! > > Forward Xu mailto:forwardxu...@gmail.com>> > 于2021年6月30日周三 上午9:35写道: > Congratulations! > > best > Forward > > Miao Wang 于2021年6月30日周三 上午8:25写道: > Congratulations! > > Sent

Re: Welcoming Jack Ye as a new committer!

2021-07-12 Thread Peter Vary
Little late to the game, but Congratulations Jack! > On Jul 7, 2021, at 03:54, OpenInx wrote: > > Congrats, Jack ! > > On Wed, Jul 7, 2021 at 7:40 AM Miao Wang wrote: > Congratulations! > > Miao > > Sent from my iPhone > >> On Jul 5, 2021, at 4:14 PM, Daniel Weeks >

Re: Iceberg build and Nessie

2021-06-21 Thread Peter Vary
+1 for option 3 We do similar things for Hive3 tests where only jdk8 is supported. > On Jun 14, 2021, at 23:02, Ryan Murray wrote: > > Thanks Jack/Ryan for the feedback! > > I am happy to go with option 3 if that is what people agree on. Its the > simplest change. > > To be clear: We will jus

Hive Vectorization

2021-05-27 Thread Peter Vary
Hi Team, Currently we are working on enabling vectorization for reading Iceberg tables through Hive. This will have serious performance benefit in itself and we would like to contribute the code to the Iceberg codebase as well. Adam Szita created a pull request for it: "Hive: Vectorized ORC rea

Re: question about the iceberg manifest/manifest list/metadata api

2021-05-26 Thread Peter Vary
Hi Yong Yang, Your message is ended up in my spam folder claiming that many messages from @163.com are spam messages, but your question seems legitimate. With the Java API you can add Parquet files to the Iceberg tables where the files conform to the specification. For Parquet, take a look here:

Re: Table and Snapshot Level Configs and Metadata

2021-05-17 Thread Peter Vary
Hi Qinhua, Jack, We are also trying to explore the possibilities for users to share a specific version of a table easily. The use-case is that we have a quite frequently updated working table, but during that we identify specific snapshot which are good working copies to share. Other users do

Re: Iceberg tables not using hive catalog's hive.metastore.warehouse.dir

2021-04-28 Thread Peter Vary
Hi Huadong, If the table location is not provided then the table will automatically be placed under the database (namespace) location, but if the location is provided then it could point to anywhere and the table should work. The default directory structure could help with organizing you data, b

CI problems

2021-04-12 Thread Peter Vary
Hi Team, Today morning I have pushed 2 commits which had green runs both (#2129, #2449). It turns out that together they caused a compilation issue which was found by several people (#2448, #2449, #2461). There were runs today on the CI, but even after 2 hours and repeated tries the Java 8 test

Re: [VOTE] Release Apache Iceberg 0.11.1 RC0

2021-03-31 Thread Peter Vary
+1 Checked the signatures Run build and tests Sadly did not have time to run manual Hive tests this time and I will be out till mid next week. :( Thanks, Peter > On Mar 30, 2021, at 23:41, Russell Spitzer wrote: > > +1 - > Ran the tests > Checked the Checksum > Made sure there were no binary

Re: Welcoming Ryan Murray as a new committer!

2021-03-29 Thread Peter Vary
Congratulations Ryan! Good to have you here! Yan Yan ezt írta (időpont: 2021. márc. 30., Ke 7:15): > Congratulations, Ryan! > > On Mon, Mar 29, 2021 at 9:10 PM Edgar Rodriguez > wrote: > >> Congratulations, Ryan! >> >> Best, >> >> On Mon, Mar 29, 2021 at 10:39 PM Jack Ye wrote: >> >>> Congratu

Re: Welcoming Russell Spitzer as a new committer

2021-03-29 Thread Peter Vary
Congratulations Russell! Well deserved! Yan Yan ezt írta (időpont: 2021. márc. 30., Ke 7:15): > Congratulations, Russell! > > On Mon, Mar 29, 2021 at 9:10 PM Edgar Rodriguez > wrote: > >> Congrats, Russell! >> >> Cheers, >> >> On Mon, Mar 29, 2021 at 11:01 PM Robin Stephen >> wrote: >> >>> Con

Re: Welcoming Yan Yan as a new committer!

2021-03-24 Thread Peter Vary
Congratulations Yan! > On Mar 24, 2021, at 05:43, Yufei Gu wrote: > > Congratulations, Yan! > > Best, > > Yufei > > `This is not a contribution` > > > On Tue, Mar 23, 2021 at 8:44 PM Russell Spitzer > wrote: > Congratulations! > >> On Mar 23, 2021, at 9:3

Hive multitable inserts

2021-03-18 Thread Peter Vary
Hi Team, Anyone interested in reviewing the PR which enables multitable inserts for Hive queries? Here is the PR: https://github.com/apache/iceberg/pull/2228 Sorry for the wide audience, but the PR is there for more than a month now. Thanks, Peter

Re: Hive query with join of Iceberg table and Hive table

2021-03-12 Thread Peter Vary
is is happening in > your case? > > On Wed, Mar 3, 2021 at 7:53 AM Edgar Rodriguez > wrote: > On Wed, Mar 3, 2021 at 1:48 AM Peter Vary wrote: > Quick question @Edgar: Am I right that the table is created by Spark? I think > if it is created from Hive and we inserted the

Hive meetup with some Iceberg spicing on top (March 17th 17:00 UTC)

2021-03-11 Thread Peter Vary
Hey Team, There will be a Hive meetup on March 17th 17:00 UTC where we will speak about the Iceberg-Hive integration work we have done recently and about our future plans. The planned topics are accessible here: https://docs.google.com/document/d/12jaWa7e6jvVjUaxoMWNJcjvTjnNoqwdCAMyswY1OiUg/edi

Hive Iceberg integration

2021-03-03 Thread Peter Vary
Hi Iceberg and Hive Teams, As some of you already know we are working on making Iceberg available as a first class storage layer for Hive. Folks on the Iceberg side made a good job on utilizing the existing Hive SerDe API for the released Hive 2.3.8 and 3.1.2 versions. Thanks to their efforts w

Re: Hive query with join of Iceberg table and Hive table

2021-03-02 Thread Peter Vary
e/hadoop/hive/ql/metadata/InputEstimator.Estimation.html#Estimation-int-long-> >>> the query works successfully in the expected amount of time - maybe a >>> better implementation can be done with the actual extimation. I assume this >>> is only an issue you hit when

Re: Hive query with join of Iceberg table and Hive table

2021-03-02 Thread Peter Vary
I have seen this kind of problem when the catalog was not configured for the table/session and we ended up using the default catalog instead of HiveCatalog > On Mar 2, 2021, at 18:49, Edgar Rodriguez > wrote: > > Hi, > > I'm trying to run a simple query in Hive 2.3.4 with a join of a Hive tab

Re: Reading data from Iceberg table into Apache Arrow in Java

2021-03-02 Thread Peter Vary
c8fd0ccb95b5e/spark/src/main/java/org/apache/iceberg/spark/source/BaseDataReader.java#L68 > ). > > > > > > We’ll probably need some help from someone who understands the Spark > vectorized read path. > > > > But, I’ll read the code to understand the deletes. > > > >

Re: Reading data from Iceberg table into Apache Arrow in Java

2021-03-02 Thread Peter Vary
Hi Mayur, Playing around with the idea of implementing vectorized reads for Hive so your message come just in time :) Took a quick look at the code but I do not really understand how vectorized reads handle deletes. In non-vectorized code-path I have found this which filters the rows one-by-on

Re: Default TimeZone for unit tests

2021-03-02 Thread Peter Vary
> What if we had the Ci time zone be random? > > Sent from my iPhone > >> On Mar 2, 2021, at 5:54 AM, Peter Vary wrote: >> >> Maybe separating the unit tests for this would not worth the effort. >> >> I was thinking we should run the tests like this:

Re: Default TimeZone for unit tests

2021-03-02 Thread Peter Vary
off with result in a timestamp > that would end up actually being several days off so it was clear. > > So maybe it makes sense to break out some timestamp specific tests and have > them run with different local timezones? Then you have a UTC, PST, CEU or > whatever test suite

Re: Airflow Integration

2021-03-02 Thread Peter Vary
Hi Gustavo, Not too familiar with the Airflow user base/use cases, but we had to consider similar things when decided what to do with `CREATE EXTERNAL TABLE ice_table PARTITIONED BY ...` Hive queries. See: https://github.com/apache/iceberg/pull/1917

Default TimeZone for unit tests

2021-03-01 Thread Peter Vary
Hi Team, Last weekend I caused a little bit of stir by pushing changes which had a green run on CI, but was failing locally if the default TZ was different than UTC. Do we want to set the TZ of the CI tests to some random non-UTC TZ to catch these errors? Pros: We can catch tests which are onl

Re: Reverting commit

2021-02-28 Thread Peter Vary
or it if you have > one and then make sure it's reviewed by other committers as soon as you can. > When in doubt, I'd probably go for the option to revert but it depends on the > situation. > > On Fri, Feb 26, 2021 at 11:45 PM Peter Vary > wrote: > Hi Team, &

Reverting commit

2021-02-26 Thread Peter Vary
Hi Team, Edgar Rodriguez reported, that my last change (https://github.com/apache/iceberg/commit/23735d1d99abf0207543ff5d9dcb63ae4fe4ec02) is braking the test in non-UTC timezones. Probably fixed by another pending change (https://github.com/apache/iceberg/pull/2278

Re: CachingCatalog question

2021-02-25 Thread Peter Vary
nential backoff) occurred. > > On Thu, Feb 25, 2021 at 4:45 AM Peter Vary > wrote: > >> Hi Team, >> >> Recently we have been playing around 100GB TCP-DS queries above Iceberg >> backed Hive tables. >> We have found that for queries accessing big partitioned

CachingCatalog question

2021-02-25 Thread Peter Vary
Hi Team, Recently we have been playing around 100GB TCP-DS queries above Iceberg backed Hive tables. We have found that for queries accessing big partitioned tables had very-very slow compilation time. One example is query77 the compilation took ~20min: INFO : Completed compiling command(quer

Re: Timestamp / Date filtering

2021-02-19 Thread Peter Vary
review. Thanks, Peter > On Feb 19, 2021, at 15:08, Peter Vary wrote: > > Hi Team, > > Running tests against Iceberg backed Hive tables I was trying to run this > query: > > SELECT * from date_test WHERE d_date='1998-02-19' > > The query fails with the

Timestamp / Date filtering

2021-02-19 Thread Peter Vary
Hi Team, Running tests against Iceberg backed Hive tables I was trying to run this query: SELECT * from date_test WHERE d_date='1998-02-19' The query fails with the Exception below. Basically complains that when filtering the Date field it expects an Integer in the Record, but finds a LocalDat

Re: [VOTE] Release Apache Iceberg 0.11.0 RC0

2021-01-25 Thread Peter Vary
+1 (binding) from my side Here are the checks what I have done: Downloaded the source and built the code Checked the size of the iceberg-hive-runtime-0.11.0.jar Tried out the jar on a CDP cluster using Hue Create a table Inserted values into the table Selected the value from the table Selected the

Re: Iceberg/Hive properties handling

2020-12-08 Thread Peter Vary
Blue would > prefer #4 (correct?). Any other strong opinions? > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > > On Thu, Dec 3, 2020 at 9:27 AM Peter Vary wrote: > As Jacques suggested (with the help of Zoltan) I have collecte

Re: Sync notes for 2 December 2020

2020-12-07 Thread Peter Vary
Hi Team, I will be OOO starting from the end of this week. If someone wants to pick up Hive config changes, feel free to grab it. I am afraid I will not have enough time to do it this year. Thanks, Peter > On Dec 5, 2020, at 02:13, Ryan Blue wrote: > > Hi everyone, > > I just wrote up my no

Re: Iceberg/Hive properties handling

2020-12-03 Thread Peter Vary
t/d/1KumHM9IKbQyleBEUHZDbeoMjd7n6feUPJ5zK8NQb-Qw/edit?usp=sharing> My feeling is that we do not have a final decision, so tried to list all the possible solutions. Please comment! Thanks, Peter > On Dec 2, 2020, at 18:10, Peter Vary wrote: > > When I was working on the CREATE TABLE patch I found the following >

Re: Iceberg/Hive properties handling

2020-12-02 Thread Peter Vary
Hive config should also be included right? > > > On Tue, Dec 1, 2020 at 10:36 AM Peter Vary <mailto:pv...@cloudera.com>> wrote: > I will ask Laszlo if he wants to update his doc. > > I see both pros and cons of catalog definition in config files. If there is >

Re: Iceberg/Hive properties handling

2020-12-01 Thread Peter Vary
operties from being passed >>>to Iceberg than to include them. Otherwise, users don’t know what to do >>> to >>>pass table properties from Hive or Impala. If we exclude a prefix or >>>specific properties, then everything but the properties

Re: Iceberg/Hive properties handling

2020-11-30 Thread Peter Vary
on. I have seen it used both ways for HBaseSerDe. (even the wiki page uses both :) ). Since Impala prefers TBLPROPERTIES and if we start using prefix for separating real Iceberg table properties from other properties, then we can keep it at TBLPROPERTIES. > Thanks, > Zoltan > &g

Re: Iceberg/Hive properties handling

2020-11-30 Thread Peter Vary
Hi, Based on the discussion below I understand we have the following kinds of properties: Iceberg table properties - Engine independent, storage related parameters "how to get to" - I think these are mostly Hive table specific properties, since for Spark, the Spark catalog configuration serves f

Re: Iceberg - Hive schema synchronization

2020-11-26 Thread Peter Vary
fine an attribute for the max > length. > > Vivekanand, what kind of conversions are you needing. Hive has a *lot* of > conversions. Many of those conversions are more error-prone than useful. (For > example, I seriously doubt anyone found Hive's conversion of timestamps to > bool

Iceberg - Hive schema synchronization

2020-11-24 Thread Peter Vary
Hi Team, With Shardul we had a longer discussion yesterday about the schema synchronization between Iceberg and Hive, and we thought that it would be good to ask the opinion of the greater community too. We can have 2 sources for the schemas. Hive table definition / schema Iceberg schema. If

Re: Integrating Existing Iceberg Tables with a Metastore

2020-11-23 Thread Peter Vary
nt to s3://old_table/. > > I could alternatively `aws s3 sync` data files from the old table to the new > one, rewrite all the old metadata + snapshot manifest lists + manifest files > to point to the new data directory, and leave s3://old_table/ untouched, but > I guess that

Re: Integrating Existing Iceberg Tables with a Metastore

2020-11-20 Thread Peter Vary
Hi Marko, The command you mention below: `CREATE EXTERNAL TABLE` above an existing Iceberg table will not transfer the "responsibility" of tracking the snapshot to HMS. It only creates a HMS external table which will allow Hive queries to read the given table. If you want to track the snapshot

Re: CI logging question

2020-11-19 Thread Peter Vary
e you're debugging this > issue it would be turned on? > > On Wed, 18 Nov 2020 at 09:03, Peter Vary wrote: > Hi Team, > > Recently I have been working on trying to reproduce the following CI failure > without success: > > org.apache.iceberg.mr.hive.TestHiveIcebergS

CI logging question

2020-11-18 Thread Peter Vary
Hi Team, Recently I have been working on trying to reproduce the following CI failure without success: org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithCustomCatalog > testScanTable[fileFormat=PARQUET, engine=tez] FAILED java.lang.IllegalArgumentException: Failed to execute Hive

Re: Flaky tests

2020-10-16 Thread Peter Vary
Caused by: > java.sql.SQLTimeoutException: Login timeout exceeded. > > Caused by: > ERROR XBDA0: Login timeout exceeded. > > > > On Wed, Oct 14, 2020 at 3:36 PM Peter Vary wrote: > Hi Team, > > La

Flaky tests

2020-10-14 Thread Peter Vary
Hi Team, Lately I have seen these flaky tests (TestHiveIcebergStorageHandler*.testJoinTables* / TestHiveIcebergStorageHandler*.testScanTable*): > org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithHiveCatalog > > testJoinTablesParquet FAILED >java.lang.IllegalArgumentException: Fa

Re: Welcoming Zheng Hu as a new committer

2020-10-12 Thread Peter Vary
Congratulations! Kurt Young ezt írta (időpont: 2020. okt. 12., Hét 11:13): > Congratulations! > > Best, > Kurt > > > On Sun, Oct 11, 2020 at 12:51 AM Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> Congratulations! >> >> On Sat, Oct 10, 2020 at 8:24 AM Jungtaek Lim < >> kabhwan.openso

Re: Welcoming Jingsong Lee as a new committer

2020-10-12 Thread Peter Vary
Congratulations! Kurt Young ezt írta (időpont: 2020. okt. 12., Hét 11:13): > Congratulations! > > Best, > Kurt > > > On Sun, Oct 11, 2020 at 12:51 AM Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> Congratulations! >> >> On Sat, Oct 10, 2020 at 8:24 AM Jungtaek Lim < >> kabhwan.openso

Travis build question

2020-09-16 Thread Peter Vary
Hi Team, Struggling with a PR (https://github.com/apache/iceberg/pull/1465) where a test is green on my runs in IntelliJ, and also green if I run the test with command line, and I even run them successfully on linux with the command: ./gradlew :iceberg-core:test The problem is that the test is

Metadata error handling

2020-09-10 Thread Peter Vary
Hi Team, I have been playing around with Iceberg tables, and I had some failing writes (not really sure how it actually happened), and ended up with an empty metadata/version-hint.text file. When I have tried to read this table with Catalogs.loadTable(configuration, properties) I got a NullPoin

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-09 Thread Peter Vary
Quick question below about the proposed usage of the timestamp: > On Sep 9, 2020, at 7:24 AM, Miao Wang wrote: > > +1 Openlnx’s comment on implementation. > > Only if we have an external timing synchronization service and enforce all > clients using the service, timestamps of different client

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Peter Vary
Newby here, but if I understand correctly, the client knows the previous snapshot and the corresponding timestamp. It could be the responsibility of the client to generate a new timestamp which is higher or equal than the previous one. There might be checks implemented on commit to prevent smaller

Re: Hive Iceberg writes

2020-09-01 Thread Peter Vary
Uploaded a working implementation for unpartitioned tables. Those who are interested can take a look here: https://github.com/apache/iceberg/pull/1407 <https://github.com/apache/iceberg/pull/1407> > On Aug 31, 2020, at 16:34, Peter Vary wrote: > > Thanks everyone for the qui

Re: Hive Iceberg writes

2020-08-31 Thread Peter Vary
erested in so if nobody else responds a good starting point would > probably be an early WIP PR that everyone can follow and contribute to. > > Thanks, > > Adrian > > On Wed, 26 Aug 2020 at 17:35, Ryan Blue wrote: > I think Edgar and Adrien who have been contributing support f

Re: Question about logging

2020-08-27 Thread Peter Vary
already get > snapshot ID and filters. I’m not sure that we keep the time. We also have a > system for emitting events with more detail. Those can be used for additional > logging if you choose. We send them through a Kafka pipeline so we can > analyze all the scans taking place in ou

Hive Iceberg writes

2020-08-26 Thread Peter Vary
Hi Team, We are thinking about implementing HiveOutputFormat, so writes through Hive can work as well. Has anybody working on this? Do you know any ongoing effort related to Hive writes? Asking because we would like to prevent duplicate effort. Also if anyone has some good pointers to start for

Question about logging

2020-08-26 Thread Peter Vary
Hi Team, I was wondering if there is a general best practice for using log levels in Iceberg, what is the usual way we do it. I have been playing around with Iceberg/Hive integration and was wondering how I would be able to debug a customer case based on the logs. To be the honest answer based

How to request reviews?

2020-07-24 Thread Peter Vary
Hi Team, Created my first ever Iceberg pull request: https://github.com/apache/iceberg/pull/1240 I was not able to assign it to myself, and I was not able to request review for it. Do I need to do something, or the reviews usually not requested and