Re: Support create table like for Iceberg table?

Pucheng Yang Tue, 09 May 2023 10:13:34 -0700

Russell, to me, "snapshot" procedure is a perfect place to adopt this
feature. After the implementation, we can use the "snapshot" procedure to
snapshot a Hive table or an Iceberg table (maybe we can also make it
generic to snapshot any other table, e.g. Delta).


On Tue, May 9, 2023 at 10:00 AM Russell Spitzer <[email protected]>
wrote:

> How would Create Table Like, be different than our "Snapshot" procedure,
> just enabled for Iceberg Tables? Wondering if we should just expand that
> functionality.
>
> On Tue, May 9, 2023 at 11:54 AM Pucheng Yang <[email protected]>
> wrote:
>
>> Ryan, when I mentioned "copy of the data'', I didn't mean to
>> physically copy the data. I meant the copy of metadata and configuration
>> such that the created table can also read the data that belongs to the
>> table we created from. However, I do shared the concern that CREATE TABLE
>> LIKE, if we plan to follow what most systems do, will copy some important
>> configuration (such as gc.enabled) that I think we definitely don't want
>> since it will create a surface for people to mess up the original table. In
>> this regard, I agree we should adopt the approach of having a procedure
>> instead. So I am dropping this CREATE TABLE LIKE feature request.
>>
>> Anton, branching will work but I will still prefer creating a separate
>> table for these reasons: (1) I considered "branching" as a very advanced/
>> new feature to my customers and it is generally easy and safe to just let
>> them use a separate test table. (2) the new generated data will be placed
>> under a separate location making auditing and clean up easier. (3) if we
>> use branching, there is coordination between the user who is doing testing
>> via branching and the platform who is constantly performing table
>> maintenance, thus introducing frictions.
>>
>> On Thu, Apr 27, 2023 at 2:15 PM Anton Okolnychyi
>> <[email protected]> wrote:
>>
>>> Iceberg supports branching so that you can safely perform such tests
>>> without any risk of corrupting the table. No need to create a separate
>>> table and clone the config. Overall, I don’t think it is a good idea to
>>> break the contract of CREATE TABLE LIKE.
>>>
>>> - Anton
>>>
>>> On Apr 27, 2023, at 11:59 AM, Pucheng Yang <[email protected]>
>>> wrote:
>>>
>>> Hi Anton,
>>>
>>> Yes, I want to branch the table state and reuse the data files, but for
>>> test purposes only. Imagine if we want to test something related to reading
>>> the Iceberg table or perform row level update.
>>>
>>> And I acknowledge the potential risk of the table state being corrupted.
>>> So I am thinking we can consider adding these limitations when running the
>>> "create table like":
>>> (1) the created table should have "snapshot=true"
>>> (2) the created table should have "gc.enabled=false" to make sure
>>> existing files don't get messed up
>>> (3) the created table should have a table location different then the
>>> existing Iceberg table location it creates from
>>> We can consider "create table like" as a snapshot action for an existing
>>> Iceberg table, similar to the existing snapshot procedure we have for an
>>> existing Hive table.
>>>
>>> I know CREATE TABLE LIKE is supposed to be copy reuse existing table
>>> definition only. If we have concerns around messing up table state, I wish
>>> we can break it down into the implementation and at least first implement
>>> the part where we create tables without reusing the existing data files.
>>>
>>> On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi <
>>> [email protected]> wrote:
>>>
>>>> Pucheng, you mentioned you want to reuse existing data in the new
>>>> table? Branching Iceberg table state can lead to unexpected situations as
>>>> there will be multiple pointers in the catalog to the same state, which can
>>>> eventually corrupt the table. Isn’t CREATE TABLE LIKE supposed to just
>>>> reuse the existing table definition without copying the data?
>>>>
>>>> - Anton
>>>>
>>>> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <[email protected]>
>>>> wrote:
>>>>
>>>> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for
>>>> Iceberg tables.
>>>> You can see various examples at
>>>> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test
>>>>
>>>> - Zoltan
>>>>
>>>> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <[email protected]> wrote:
>>>>
>>>>> You should be able to see how other DSv2 commands are written and copy
>>>>> them. Look at Drop Table, maybe and see if you can copy the structure, but
>>>>> instead of dropping, load the table and call createTable with its 
>>>>> metadata.
>>>>>
>>>>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Thanks Steve and Ryan for the reply.
>>>>>>
>>>>>> Steve, I am not looking for CTAS, my goal is to create an Iceberg
>>>>>> table and reuse the existing data (same as the create table like 
>>>>>> statement
>>>>>> above). Also my question is not about specifying location in
>>>>>> create statement.
>>>>>>
>>>>>> Ryan, the engine we are interested in is SparkSQL. Since you
>>>>>> mentioned it is an easy fix, would you please share how that should be
>>>>>> implemented such that anyone (maybe myself) interested in this can 
>>>>>> explore
>>>>>> the solution?
>>>>>>
>>>>>> Thanks both again.
>>>>>>
>>>>>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <[email protected]> wrote:
>>>>>>
>>>>>>> Pucheng, what engine are you interested in?
>>>>>>>
>>>>>>> This works fine in Trino: CREATE TABLE table_copy (LIKE
>>>>>>> source_table INCLUDING PROPERTIES)
>>>>>>>
>>>>>>> I don’t know if it works in Hive, and last time I checked it was not
>>>>>>> implemented for DSv2 in Spark. The Spark problem should be an easy fix.
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hey Pengcheng,
>>>>>>>>
>>>>>>>>    Are you looking for CTAS as in
>>>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select?
>>>>>>>>  I
>>>>>>>> think you can also specify explicit location as part of create 
>>>>>>>> statement in
>>>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Steve Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I wonder how folks in the community deal with the cases where you
>>>>>>>> want to create a test table from an existing iceberg table? In Hive, 
>>>>>>>> what
>>>>>>>> we normally do is to run a query "create table x like y location z". 
>>>>>>>> But we
>>>>>>>> can't do this for the Iceberg table.
>>>>>>>>
>>>>>>>> If this is a feature that is missing, should we collaborate to
>>>>>>>> build a similar feature?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>
>>>>
>>>

Re: Support create table like for Iceberg table?

Reply via email to