Pucheng,

I think it is reasonable to set up a temporary table with gc.enabled and
identical configuration, but that isn't the behavior of CREATE TABLE LIKE
in most systems that I'm aware of. I think that command creates a copy with
the same metadata and configuration, but not a copy of the data. What
you're talking about would probably be something I'd build as a new stored
procedure.

On Thu, Apr 27, 2023 at 11:59 AM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> Hi Anton,
>
> Yes, I want to branch the table state and reuse the data files, but for
> test purposes only. Imagine if we want to test something related to reading
> the Iceberg table or perform row level update.
>
> And I acknowledge the potential risk of the table state being corrupted.
> So I am thinking we can consider adding these limitations when running the
> "create table like":
> (1) the created table should have "snapshot=true"
> (2) the created table should have "gc.enabled=false" to make sure existing
> files don't get messed up
> (3) the created table should have a table location different then the
> existing Iceberg table location it creates from
> We can consider "create table like" as a snapshot action for an existing
> Iceberg table, similar to the existing snapshot procedure we have for an
> existing Hive table.
>
> I know CREATE TABLE LIKE is supposed to be copy reuse existing table
> definition only. If we have concerns around messing up table state, I wish
> we can break it down into the implementation and at least first implement
> the part where we create tables without reusing the existing data files.
>
> On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi
> <aokolnyc...@apple.com.invalid> wrote:
>
>> Pucheng, you mentioned you want to reuse existing data in the new table?
>> Branching Iceberg table state can lead to unexpected situations as there
>> will be multiple pointers in the catalog to the same state, which can
>> eventually corrupt the table. Isn’t CREATE TABLE LIKE supposed to just
>> reuse the existing table definition without copying the data?
>>
>> - Anton
>>
>> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <borokna...@apache.org>
>> wrote:
>>
>> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for
>> Iceberg tables.
>> You can see various examples at
>> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test
>>
>> - Zoltan
>>
>> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <b...@tabular.io> wrote:
>>
>>> You should be able to see how other DSv2 commands are written and copy
>>> them. Look at Drop Table, maybe and see if you can copy the structure, but
>>> instead of dropping, load the table and call createTable with its metadata.
>>>
>>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang <
>>> py...@pinterest.com.invalid> wrote:
>>>
>>>> Thanks Steve and Ryan for the reply.
>>>>
>>>> Steve, I am not looking for CTAS, my goal is to create an Iceberg table
>>>> and reuse the existing data (same as the create table like statement
>>>> above). Also my question is not about specifying location in
>>>> create statement.
>>>>
>>>> Ryan, the engine we are interested in is SparkSQL. Since you mentioned
>>>> it is an easy fix, would you please share how that should be implemented
>>>> such that anyone (maybe myself) interested in this can explore the 
>>>> solution?
>>>>
>>>> Thanks both again.
>>>>
>>>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <b...@tabular.io> wrote:
>>>>
>>>>> Pucheng, what engine are you interested in?
>>>>>
>>>>> This works fine in Trino: CREATE TABLE table_copy (LIKE source_table
>>>>> INCLUDING PROPERTIES)
>>>>>
>>>>> I don’t know if it works in Hive, and last time I checked it was not
>>>>> implemented for DSv2 in Spark. The Spark problem should be an easy fix.
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang <
>>>>> hongyue_zh...@apple.com.invalid> wrote:
>>>>>
>>>>>> Hey Pengcheng,
>>>>>>
>>>>>>    Are you looking for CTAS as in
>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select?
>>>>>>  I
>>>>>> think you can also specify explicit location as part of create statement 
>>>>>> in
>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table
>>>>>>
>>>>>> Thanks,
>>>>>> Steve Zhang
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang <
>>>>>> py...@pinterest.com.INVALID> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I wonder how folks in the community deal with the cases where you
>>>>>> want to create a test table from an existing iceberg table? In Hive, what
>>>>>> we normally do is to run a query "create table x like y location z". But 
>>>>>> we
>>>>>> can't do this for the Iceberg table.
>>>>>>
>>>>>> If this is a feature that is missing, should we collaborate to build
>>>>>> a similar feature?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>>

-- 
Ryan Blue
Tabular

Reply via email to