Hi Pucheng Yang , The latest master branch of Hive also supports "Create Table Like" for iceberg tables.
Related commits: HIVE-26519: Iceberg: Add support for CTLT queries.[1] HIVE-26950: Iceberg: (CTLT) Create external table like V2 table is not preserving table properties [2] [1] https://github.com/apache/hive/commit/d96c31b2a87367279ef7e61ce8cda60d04db303c [2] https://github.com/apache/hive/commit/9f4a9c6aedf7dd097a2961d0507ef2ef089853dc thanks! On Tue, May 9, 2023 at 10:43 PM Pucheng Yang <py...@pinterest.com.invalid> wrote: > Russell, to me, "snapshot" procedure is a perfect place to adopt this > feature. After the implementation, we can use the "snapshot" procedure to > snapshot a Hive table or an Iceberg table (maybe we can also make it > generic to snapshot any other table, e.g. Delta). > > On Tue, May 9, 2023 at 10:00 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> How would Create Table Like, be different than our "Snapshot" procedure, >> just enabled for Iceberg Tables? Wondering if we should just expand that >> functionality. >> >> On Tue, May 9, 2023 at 11:54 AM Pucheng Yang <py...@pinterest.com.invalid> >> wrote: >> >>> Ryan, when I mentioned "copy of the data'', I didn't mean to >>> physically copy the data. I meant the copy of metadata and configuration >>> such that the created table can also read the data that belongs to the >>> table we created from. However, I do shared the concern that CREATE TABLE >>> LIKE, if we plan to follow what most systems do, will copy some important >>> configuration (such as gc.enabled) that I think we definitely don't want >>> since it will create a surface for people to mess up the original table. In >>> this regard, I agree we should adopt the approach of having a procedure >>> instead. So I am dropping this CREATE TABLE LIKE feature request. >>> >>> Anton, branching will work but I will still prefer creating a separate >>> table for these reasons: (1) I considered "branching" as a very advanced/ >>> new feature to my customers and it is generally easy and safe to just let >>> them use a separate test table. (2) the new generated data will be placed >>> under a separate location making auditing and clean up easier. (3) if we >>> use branching, there is coordination between the user who is doing testing >>> via branching and the platform who is constantly performing table >>> maintenance, thus introducing frictions. >>> >>> On Thu, Apr 27, 2023 at 2:15 PM Anton Okolnychyi >>> <aokolnyc...@apple.com.invalid> wrote: >>> >>>> Iceberg supports branching so that you can safely perform such tests >>>> without any risk of corrupting the table. No need to create a separate >>>> table and clone the config. Overall, I don’t think it is a good idea to >>>> break the contract of CREATE TABLE LIKE. >>>> >>>> - Anton >>>> >>>> On Apr 27, 2023, at 11:59 AM, Pucheng Yang <py...@pinterest.com.INVALID> >>>> wrote: >>>> >>>> Hi Anton, >>>> >>>> Yes, I want to branch the table state and reuse the data files, but for >>>> test purposes only. Imagine if we want to test something related to reading >>>> the Iceberg table or perform row level update. >>>> >>>> And I acknowledge the potential risk of the table state being >>>> corrupted. So I am thinking we can consider adding these limitations when >>>> running the "create table like": >>>> (1) the created table should have "snapshot=true" >>>> (2) the created table should have "gc.enabled=false" to make sure >>>> existing files don't get messed up >>>> (3) the created table should have a table location different then the >>>> existing Iceberg table location it creates from >>>> We can consider "create table like" as a snapshot action for an >>>> existing Iceberg table, similar to the existing snapshot procedure we have >>>> for an existing Hive table. >>>> >>>> I know CREATE TABLE LIKE is supposed to be copy reuse existing table >>>> definition only. If we have concerns around messing up table state, I wish >>>> we can break it down into the implementation and at least first implement >>>> the part where we create tables without reusing the existing data files. >>>> >>>> On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi < >>>> aokolnyc...@apple.com.invalid> wrote: >>>> >>>>> Pucheng, you mentioned you want to reuse existing data in the new >>>>> table? Branching Iceberg table state can lead to unexpected situations as >>>>> there will be multiple pointers in the catalog to the same state, which >>>>> can >>>>> eventually corrupt the table. Isn’t CREATE TABLE LIKE supposed to just >>>>> reuse the existing table definition without copying the data? >>>>> >>>>> - Anton >>>>> >>>>> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <borokna...@apache.org> >>>>> wrote: >>>>> >>>>> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y >>>>> for Iceberg tables. >>>>> You can see various examples at >>>>> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test >>>>> >>>>> - Zoltan >>>>> >>>>> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <b...@tabular.io> wrote: >>>>> >>>>>> You should be able to see how other DSv2 commands are written and >>>>>> copy them. Look at Drop Table, maybe and see if you can copy the >>>>>> structure, >>>>>> but instead of dropping, load the table and call createTable with its >>>>>> metadata. >>>>>> >>>>>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang < >>>>>> py...@pinterest.com.invalid> wrote: >>>>>> >>>>>>> Thanks Steve and Ryan for the reply. >>>>>>> >>>>>>> Steve, I am not looking for CTAS, my goal is to create an Iceberg >>>>>>> table and reuse the existing data (same as the create table like >>>>>>> statement >>>>>>> above). Also my question is not about specifying location in >>>>>>> create statement. >>>>>>> >>>>>>> Ryan, the engine we are interested in is SparkSQL. Since you >>>>>>> mentioned it is an easy fix, would you please share how that should be >>>>>>> implemented such that anyone (maybe myself) interested in this can >>>>>>> explore >>>>>>> the solution? >>>>>>> >>>>>>> Thanks both again. >>>>>>> >>>>>>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>>> Pucheng, what engine are you interested in? >>>>>>>> >>>>>>>> This works fine in Trino: CREATE TABLE table_copy (LIKE >>>>>>>> source_table INCLUDING PROPERTIES) >>>>>>>> >>>>>>>> I don’t know if it works in Hive, and last time I checked it was >>>>>>>> not implemented for DSv2 in Spark. The Spark problem should be an easy >>>>>>>> fix. >>>>>>>> >>>>>>>> Ryan >>>>>>>> >>>>>>>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang < >>>>>>>> hongyue_zh...@apple.com.invalid> wrote: >>>>>>>> >>>>>>>>> Hey Pengcheng, >>>>>>>>> >>>>>>>>> Are you looking for CTAS as in >>>>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select? >>>>>>>>> I >>>>>>>>> think you can also specify explicit location as part of create >>>>>>>>> statement in >>>>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Steve Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang < >>>>>>>>> py...@pinterest.com.INVALID> wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I wonder how folks in the community deal with the cases where you >>>>>>>>> want to create a test table from an existing iceberg table? In Hive, >>>>>>>>> what >>>>>>>>> we normally do is to run a query "create table x like y location z". >>>>>>>>> But we >>>>>>>>> can't do this for the Iceberg table. >>>>>>>>> >>>>>>>>> If this is a feature that is missing, should we collaborate to >>>>>>>>> build a similar feature? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>> >>>>> >>>>