Hi Anton,

Yes, I want to branch the table state and reuse the data files, but for
test purposes only. Imagine if we want to test something related to reading
the Iceberg table or perform row level update.

And I acknowledge the potential risk of the table state being corrupted. So
I am thinking we can consider adding these limitations when running the
"create table like":
(1) the created table should have "snapshot=true"
(2) the created table should have "gc.enabled=false" to make sure existing
files don't get messed up
(3) the created table should have a table location different then the
existing Iceberg table location it creates from
We can consider "create table like" as a snapshot action for an existing
Iceberg table, similar to the existing snapshot procedure we have for an
existing Hive table.

I know CREATE TABLE LIKE is supposed to be copy reuse existing table
definition only. If we have concerns around messing up table state, I wish
we can break it down into the implementation and at least first implement
the part where we create tables without reusing the existing data files.

On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi
<aokolnyc...@apple.com.invalid> wrote:

> Pucheng, you mentioned you want to reuse existing data in the new table?
> Branching Iceberg table state can lead to unexpected situations as there
> will be multiple pointers in the catalog to the same state, which can
> eventually corrupt the table. Isn’t CREATE TABLE LIKE supposed to just
> reuse the existing table definition without copying the data?
>
> - Anton
>
> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <borokna...@apache.org>
> wrote:
>
> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for
> Iceberg tables.
> You can see various examples at
> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test
>
> - Zoltan
>
> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <b...@tabular.io> wrote:
>
>> You should be able to see how other DSv2 commands are written and copy
>> them. Look at Drop Table, maybe and see if you can copy the structure, but
>> instead of dropping, load the table and call createTable with its metadata.
>>
>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang <py...@pinterest.com.invalid>
>> wrote:
>>
>>> Thanks Steve and Ryan for the reply.
>>>
>>> Steve, I am not looking for CTAS, my goal is to create an Iceberg table
>>> and reuse the existing data (same as the create table like statement
>>> above). Also my question is not about specifying location in
>>> create statement.
>>>
>>> Ryan, the engine we are interested in is SparkSQL. Since you mentioned
>>> it is an easy fix, would you please share how that should be implemented
>>> such that anyone (maybe myself) interested in this can explore the solution?
>>>
>>> Thanks both again.
>>>
>>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> Pucheng, what engine are you interested in?
>>>>
>>>> This works fine in Trino: CREATE TABLE table_copy (LIKE source_table
>>>> INCLUDING PROPERTIES)
>>>>
>>>> I don’t know if it works in Hive, and last time I checked it was not
>>>> implemented for DSv2 in Spark. The Spark problem should be an easy fix.
>>>>
>>>> Ryan
>>>>
>>>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang <
>>>> hongyue_zh...@apple.com.invalid> wrote:
>>>>
>>>>> Hey Pengcheng,
>>>>>
>>>>>    Are you looking for CTAS as in
>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select?
>>>>>  I
>>>>> think you can also specify explicit location as part of create statement 
>>>>> in
>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table
>>>>>
>>>>> Thanks,
>>>>> Steve Zhang
>>>>>
>>>>>
>>>>>
>>>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang <py...@pinterest.com.INVALID>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I wonder how folks in the community deal with the cases where you want
>>>>> to create a test table from an existing iceberg table? In Hive, what we
>>>>> normally do is to run a query "create table x like y location z". But we
>>>>> can't do this for the Iceberg table.
>>>>>
>>>>> If this is a feature that is missing, should we collaborate to build a
>>>>> similar feature?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>
>

Reply via email to