Re: [DISCUSS] Spark 3.1 support?

2023-04-25 Thread Walaa Eldin Moustafa
To elaborate on LinkedIn's use case: * LinkedIn maintains its own fork, but we would like to keep it as close to upstream as possible. * +1 to Manu on migrations in large companies could take well beyond 18 months, and it is unlikely to migrate/upgrade more frequently. * One important use case for

Re: Support create table like for Iceberg table?

2023-04-25 Thread Ryan Blue
You should be able to see how other DSv2 commands are written and copy them. Look at Drop Table, maybe and see if you can copy the structure, but instead of dropping, load the table and call createTable with its metadata. On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang wrote: > Thanks Steve and Rya

Re: Support create table like for Iceberg table?

2023-04-25 Thread Pucheng Yang
Thanks Steve and Ryan for the reply. Steve, I am not looking for CTAS, my goal is to create an Iceberg table and reuse the existing data (same as the create table like statement above). Also my question is not about specifying location in create statement. Ryan, the engine we are interested in is

Re: Iceberg shell and binaries

2023-04-25 Thread Ryan Blue
Hey Arvind, Unfortunately, the image didn't come through. It might be easier to ask for help directly on Slack if images are needed to understand the issue you're having. I'm not sure what web UI you're trying to use, but if it is Tabular, you should be able to get Spark set up using the Spark wo

Re: SQL Syntax for Time Travel on a Branch?

2023-04-25 Thread Ryan Blue
Everything Amogh said is correct, but I can give a bit more context. There are two options for the behavior of time travel by timestamp. First, you can read the state of the table that you _would have read_ if you ran the query at that time. Second, you could read the ancestor of the current state

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-25 Thread Ryan Blue
Yeah, I do like the idea of using --release. We'll need to test it with those platforms though. On Tue, Apr 25, 2023 at 12:04 PM Anton Okolnychyi wrote: > It would be interesting to hear more from Hive/Impala folks on this topic. > > I’d avoid supporting both JDK8 and JDK11 given the complexity

Re: Support create table like for Iceberg table?

2023-04-25 Thread Ryan Blue
Pucheng, what engine are you interested in? This works fine in Trino: CREATE TABLE table_copy (LIKE source_table INCLUDING PROPERTIES) I don’t know if it works in Hive, and last time I checked it was not implemented for DSv2 in Spark. The Spark problem should be an easy fix. Ryan On Tue, Apr 25

Re: SQL Syntax for Time Travel on a Branch?

2023-04-25 Thread Jahagirdar, Amogh
Hi Micah, Your understanding is right, as of today there is no mechanism for performing time travel on branch. Time travel relies on history which captures all the changes on the main table state. At present there is no history metadata for branches (we can’t use snapshot lineages), for more de

SQL Syntax for Time Travel on a Branch?

2023-04-25 Thread Micah Kornfield
Looking through the documents for Spark SQL syntax [1], it appears that Iceberg supports reading a branch at the latest version or time-travel on the main table, but I didn't see any queries that compose the two. Is my understanding correct that there isn't existing SQL for time travel on a specif

Re: Support create table like for Iceberg table?

2023-04-25 Thread Steve Zhang
Hey Pengcheng, Are you looking for CTAS as in https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select? I think you can also specify explicit location as part of create statement in https://iceber

Support create table like for Iceberg table?

2023-04-25 Thread Pucheng Yang
Hi all, I wonder how folks in the community deal with the cases where you want to create a test table from an existing iceberg table? In Hive, what we normally do is to run a query "create table x like y location z". But we can't do this for the Iceberg table. If this is a feature that is missing

Re: [DISCUSS] Spark 3.1 support?

2023-04-25 Thread Anton Okolnychyi
Ok, seems like we are in agreement to deprecate 3.1. I’ll fire a PR shortly. Does anyone want to go through changes in 3.3 and 3.2 and find what we missed to cherry-pick so that we have that list in one place (e.g. create an issue)? Any thoughts on how to mark changes as candidates for cherry-pi

Re: Why is sort required for Spark writing to partitioned table

2023-04-25 Thread Russell Spitzer
https://github.com/apache/iceberg/issues/7037 On Tue, Apr 25, 2023 at 1:52 PM Pucheng Yang wrote: > Great thanks, it will be great if we can update the doc to avoid confusion. > > On Tue, Apr 25, 2023 at 11:47 AM Anton Okolnychyi > wrote: > >> We have implemented this natively in Spark and expl

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-25 Thread Anton Okolnychyi
It would be interesting to hear more from Hive/Impala folks on this topic. I’d avoid supporting both JDK8 and JDK11 given the complexity we need to add. Our build is already complicated. Jack’s idea of using JDK11 with --release flag may be worth exploring. - Anton > On Apr 24, 2023, at 10:11

Re: Why is sort required for Spark writing to partitioned table

2023-04-25 Thread Pucheng Yang
Great thanks, it will be great if we can update the doc to avoid confusion. On Tue, Apr 25, 2023 at 11:47 AM Anton Okolnychyi wrote: > We have implemented this natively in Spark and explicit sorts are no > longer required. Iceberg takes into account both the partition and sort key > in the table

Re: Why is sort required for Spark writing to partitioned table

2023-04-25 Thread Anton Okolnychyi
We have implemented this natively in Spark and explicit sorts are no longer required. Iceberg takes into account both the partition and sort key in the table to request a distribution and ordering from Spark. Should be supported both for batch and micro-batch writes. - Anton > On Apr 25, 2023,

Re: Why is sort required for Spark writing to partitioned table

2023-04-25 Thread Pucheng Yang
Hi to confirm, In the doc, https://iceberg.apache.org/docs/1.0.0/spark-writes/#writing-to-partitioned-tables, it says "Explicit sort is necessary because Spark doesn’t allow Iceberg to request a sort before writing as of Spark 3.0. SPARK-23889 is

Iceberg binaries and configuration files

2023-04-25 Thread Arvind Dige
Hi Team, I am using the open source iceberg-1.2.1 for development purposes. where I have successfully build the iceberg using steps mentioned in below github repo https://github.com/apache/iceberg I am able to configure and use it with the spark using below steps https://iceberg.apache.org/docs/l

Iceberg shell and binaries

2023-04-25 Thread Arvind Dige
Hi Team, I am using the open source iceberg-1.2.1 for development purposes. where I have successfully build the iceberg using steps mentioned in below github repo https://github.com/apache/iceberg I am able to configure and use it with the spark using below steps https://iceberg.apache.org/docs/l