Ryan, when I mentioned "copy of the data'', I didn't mean to physically copy the data. I meant the copy of metadata and configuration such that the created table can also read the data that belongs to the table we created from. However, I do shared the concern that CREATE TABLE LIKE, if we plan to follow what most systems do, will copy some important configuration (such as gc.enabled) that I think we definitely don't want since it will create a surface for people to mess up the original table. In this regard, I agree we should adopt the approach of having a procedure instead. So I am dropping this CREATE TABLE LIKE feature request.
Anton, branching will work but I will still prefer creating a separate table for these reasons: (1) I considered "branching" as a very advanced/ new feature to my customers and it is generally easy and safe to just let them use a separate test table. (2) the new generated data will be placed under a separate location making auditing and clean up easier. (3) if we use branching, there is coordination between the user who is doing testing via branching and the platform who is constantly performing table maintenance, thus introducing frictions. On Thu, Apr 27, 2023 at 2:15 PM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > Iceberg supports branching so that you can safely perform such tests > without any risk of corrupting the table. No need to create a separate > table and clone the config. Overall, I don’t think it is a good idea to > break the contract of CREATE TABLE LIKE. > > - Anton > > On Apr 27, 2023, at 11:59 AM, Pucheng Yang <py...@pinterest.com.INVALID> > wrote: > > Hi Anton, > > Yes, I want to branch the table state and reuse the data files, but for > test purposes only. Imagine if we want to test something related to reading > the Iceberg table or perform row level update. > > And I acknowledge the potential risk of the table state being corrupted. > So I am thinking we can consider adding these limitations when running the > "create table like": > (1) the created table should have "snapshot=true" > (2) the created table should have "gc.enabled=false" to make sure existing > files don't get messed up > (3) the created table should have a table location different then the > existing Iceberg table location it creates from > We can consider "create table like" as a snapshot action for an existing > Iceberg table, similar to the existing snapshot procedure we have for an > existing Hive table. > > I know CREATE TABLE LIKE is supposed to be copy reuse existing table > definition only. If we have concerns around messing up table state, I wish > we can break it down into the implementation and at least first implement > the part where we create tables without reusing the existing data files. > > On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi < > aokolnyc...@apple.com.invalid> wrote: > >> Pucheng, you mentioned you want to reuse existing data in the new table? >> Branching Iceberg table state can lead to unexpected situations as there >> will be multiple pointers in the catalog to the same state, which can >> eventually corrupt the table. Isn’t CREATE TABLE LIKE supposed to just >> reuse the existing table definition without copying the data? >> >> - Anton >> >> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <borokna...@apache.org> >> wrote: >> >> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for >> Iceberg tables. >> You can see various examples at >> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test >> >> - Zoltan >> >> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <b...@tabular.io> wrote: >> >>> You should be able to see how other DSv2 commands are written and copy >>> them. Look at Drop Table, maybe and see if you can copy the structure, but >>> instead of dropping, load the table and call createTable with its metadata. >>> >>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang < >>> py...@pinterest.com.invalid> wrote: >>> >>>> Thanks Steve and Ryan for the reply. >>>> >>>> Steve, I am not looking for CTAS, my goal is to create an Iceberg table >>>> and reuse the existing data (same as the create table like statement >>>> above). Also my question is not about specifying location in >>>> create statement. >>>> >>>> Ryan, the engine we are interested in is SparkSQL. Since you mentioned >>>> it is an easy fix, would you please share how that should be implemented >>>> such that anyone (maybe myself) interested in this can explore the >>>> solution? >>>> >>>> Thanks both again. >>>> >>>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <b...@tabular.io> wrote: >>>> >>>>> Pucheng, what engine are you interested in? >>>>> >>>>> This works fine in Trino: CREATE TABLE table_copy (LIKE source_table >>>>> INCLUDING PROPERTIES) >>>>> >>>>> I don’t know if it works in Hive, and last time I checked it was not >>>>> implemented for DSv2 in Spark. The Spark problem should be an easy fix. >>>>> >>>>> Ryan >>>>> >>>>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang < >>>>> hongyue_zh...@apple.com.invalid> wrote: >>>>> >>>>>> Hey Pengcheng, >>>>>> >>>>>> Are you looking for CTAS as in >>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select? >>>>>> I >>>>>> think you can also specify explicit location as part of create statement >>>>>> in >>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table >>>>>> >>>>>> Thanks, >>>>>> Steve Zhang >>>>>> >>>>>> >>>>>> >>>>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang < >>>>>> py...@pinterest.com.INVALID> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I wonder how folks in the community deal with the cases where you >>>>>> want to create a test table from an existing iceberg table? In Hive, what >>>>>> we normally do is to run a query "create table x like y location z". But >>>>>> we >>>>>> can't do this for the Iceberg table. >>>>>> >>>>>> If this is a feature that is missing, should we collaborate to build >>>>>> a similar feature? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> >> >