Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-11-22 Thread Zoltán Borók-Nagy
Awesome! In Impala we created our own implementations so far, but it will be nice to join forces and have a common library. Looking forward to the Slack channel. Cheers, Zoltan On Fri, Nov 22, 2024 at 5:01 PM Gang Wu wrote: > > I have created an issue [1] to collect initial ideas for the i

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2024-11-22 Thread Zoltán Borók-Nagy
store the ETag, for other catalogs some other information for the >> same purpose. >> 2) I checked this description of ETags, and even though we discussed earlier >> that this is some server generated information, for me it seems that it can >> be basically anything: &g

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2024-11-21 Thread Zoltán Borók-Nagy
just add their clever tricks to make it more efficient. Cheers, Zoltan On Thu, Nov 21, 2024 at 9:53 AM Zoltán Borók-Nagy wrote: > > Hi, > > I agree with Gabor that the support of efficiently reloading Iceberg > tables is a generic problem that applies to all catalog > impleme

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2024-11-21 Thread Zoltán Borók-Nagy
es and that the cross-language compatible REST catalog becomes the > primary catalog for Iceberg. > > - API Perspective: Given the above, I may not be in the best position to > comment on Java APIs. However, regarding Gabor’s proposed API (Table > loadTable(Table existingTable)), I

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2024-11-18 Thread Zoltán Borók-Nagy
Hey Everyone, Thanks Gábor, I think the proposed interface would be very useful to any engine that employs caching, e.g. Impala. And it is pretty neat that it is catalog-agnostic, i.e. we just give all the information we have about the table and let the catalog implementation efficiently reload it

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread Zoltán Borók-Nagy
pala > > also produced correct Parquet files, but that's beyond our control and > > there's, no doubt, a ton of data already in that format. > > > > This could also be part of our v3 work, where I think we intend to add > > binary to string type promotion to t

Re: Spark cannot read iceberg tables which were originally written by Impala

2023-12-26 Thread Zoltán Borók-Nagy
Hey Everyone, Thank you for raising this issue and reaching out to the Impala community. Let me clarify that the problem only happens when there is a legacy Hive table written by Impala, which is then converted to Iceberg. When Impala writes into an Iceberg table there is no problem with interope

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-27 Thread Zoltán Borók-Nagy
Although Javac is not an >>> optimizing compiler and there should not be much difference in performance >>> of the jars produced by different compilers, these changes might be worth >>> for the project to declare a newer compile-time JDK across all modules, and >

Re: Support create table like for Iceberg table?

2023-04-26 Thread Zoltán Borók-Nagy
As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for Iceberg tables. You can see various examples at https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test - Zoltan On Wed, Apr 26, 2023 at 4:10 AM

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-24 Thread Zoltán Borók-Nagy
Besides Hive, neither Impala is compatible with Java11 right now. This work is in-progress: https://issues.apache.org/jira/browse/IMPALA-11360 - Zoltan On Mon, Apr 24, 2023 at 11:07 AM Mass Dosage wrote: > I agree with Ryan, unless you can change the source version there's not > that much point

Re: C++/Rust SDK sync

2023-04-12 Thread Zoltán Borók-Nagy
Hi, I am also interested in the discussion, all those times work for me. Cheers, Zoltan On Wed, Apr 12, 2023 at 4:17 AM Chao Sun wrote: > We are also interested in this discussion. Internally, we have been > working on something similar in Rust, so it'd be great if we can > combine the ef

Re: Temporal Iceberg Service

2022-09-01 Thread Zoltán Borók-Nagy
Hi Taher, I think most of your questions are answered in the Scan Planning section at the Iceberg spec page: https://iceberg.apache.org/spec/#scan-planning To give you some specific answers as well: Equality Deletes: data and delete files have sequence numbers from which readers can infer the rel

Impala reading V2 tables design doc

2022-07-08 Thread Zoltán Borók-Nagy
Hi Iceberg/Impala Team, We've been working on adding read support for Iceberg V2 tables in Impala. In the first round we're focusing on position deletes. We are thinking about different approaches so I've written a design doc about it: https://docs.google.com/document/d/1WF_UOanQ61RUuQlM4LaiRWI0Y

Re: Matching iceberg data types to Parquet data types

2021-08-27 Thread Zoltán Borók-Nagy
Hi, You can find information of type mappings here: https://iceberg.apache.org/spec/#parquet 1. Iceberg timestamps have microseconds precision. In Parquet they are stored as INT64s with TIMESTAMP_MICROS annotation. 2. Iceberg limits decimal precision to 38: https://iceberg.apache.org/spec/#primit

Re: question about the iceberg manifest/manifest list/metadata api

2021-06-08 Thread Zoltán Borók-Nagy
-online.nosdn.127.net%2Fwzpmmc%2Fb04ea4676f5ca1dc236a340a5d9d3031.jpg&items=%5B%22%E9%82%AE%E7%AE%B1yong.sunny%40163.com+from+phone%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 05/27/2021 16:54, Zoltán Borók-Nagy wrote: > Hi Yong Y

Re: question about the iceberg manifest/manifest list/metadata api

2021-05-27 Thread Zoltán Borók-Nagy
Hi Yong Yang, It is supported by Iceberg, and this is exactly how Impala is working. I.e. Impala's Parquet writer writes the data files, then we use Iceberg's API to append them to the table. You can find the relevant code here: https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be

Re: Dynamic INSERT OVERWRITE

2021-01-30 Thread Zoltán Borók-Nagy
> explicit. If you want to overwrite a day, you pass a filter for that day. > Another way around this problem is to support MERGE INTO, which will detect > the files that need to be changed and correctly rewrite them, wherever they > are in the table. > > rb > > On Fri, Jan 2

Dynamic INSERT OVERWRITE

2021-01-29 Thread Zoltán Borók-Nagy
Hey everyone, I'm currently working on the INSERT OVERWRITE statement for Iceberg tables in Impala. Seems like ReplacePartitions is the perfect interface for this job: https://github.infra.cloudera.com/CDH/iceberg/blob/cdpd-master/api/src/main/java/org/apache/iceberg/ReplacePartitions.java IIUC

Re: Welcoming Peter Vary as a new committer!

2021-01-26 Thread Zoltán Borók-Nagy
Congrats, Peter! On Tue, Jan 26, 2021 at 5:47 AM ForwardXu wrote: > Congratulations Peter! > > > -- 原始邮件 -- > *发件人:* "dev" ; > *发送时间:* 2021年1月26日(星期二) 凌晨4:25 > *收件人:* "dev"; > *主题:* Re: Welcoming Peter Vary as a new committer! > > Congratulations! > > Op ma 25 jan

Re: Iceberg/Hive properties handling

2020-12-01 Thread Zoltán Borók-Nagy
se, users don’t know what to do to >pass table properties from Hive or Impala. If we exclude a prefix or >specific properties, then everything but the properties reserved for >locating the table are passed as the user would expect. > > I don't have a strong opinion about

Re: Iceberg/Hive properties handling

2020-11-30 Thread Zoltán Borók-Nagy
Thanks, Peter. I answered inline. On Mon, Nov 30, 2020 at 3:13 PM Peter Vary wrote: > Hi Zoltan, > > Answers below: > > On Nov 30, 2020, at 14:19, Zoltán Borók-Nagy < > borokna...@cloudera.com.INVALID> wrote: > > Hi, > > Thanks for the replies. My take fo

Re: Iceberg/Hive properties handling

2020-11-30 Thread Zoltán Borók-Nagy
-case basis. > > > Based on this: > >- Shall we move the "how to get to" properties to SERDEPROPERTIES? >- Shall we define a prefix for setting Iceberg table properties from >Hive queries and omitting other engine specific properties? > > > Tha

Re: Iceberg/Hive properties handling

2020-11-26 Thread Zoltán Borók-Nagy
Hi, The above aligns with what we did in Impala, i.e. we store information about table loading in HMS table properties. We are just a bit more explicit about which catalog to use. We have table property 'iceberg.catalog' to determine the catalog type, right now the supported values are 'hadoop.tab

Re: Iceberg - Hive schema synchronization

2020-11-25 Thread Zoltán Borók-Nagy
Hi Everyone, In Impala we face the same challenges. I think a strict 1-to-1 type mapping would be beneficial because that way we could derive the Iceberg schema from the Hive schema, not just the other way around. So we could just naturally create Iceberg tables via DDL. We should use the same ty

INSERT to Iceberg tables from Impala

2020-09-11 Thread Zoltán Borók-Nagy
Hi, I'm willing to add INSERT support for Iceberg tables in Impala. For start I created the following design doc: https://docs.google.com/document/d/1_KL0YptDKwhiXvJyx4Vb-yZjggrPQAW2yjeGV4C0vMU/edit?usp=sharing All comments are welcome. Thanks, Zoltan