Re: Support create table like for Iceberg table?

Simhadri G Tue, 09 May 2023 11:06:40 -0700

Hi Pucheng Yang ,

The latest master branch of Hive also supports "Create Table Like" for
iceberg tables.


Related commits:
HIVE-26519: Iceberg: Add support for CTLT queries.[1]
HIVE-26950: Iceberg: (CTLT) Create external table like V2 table is not
preserving table properties [2]

[1]
https://github.com/apache/hive/commit/d96c31b2a87367279ef7e61ce8cda60d04db303c
[2]
https://github.com/apache/hive/commit/9f4a9c6aedf7dd097a2961d0507ef2ef089853dc

thanks!

On Tue, May 9, 2023 at 10:43 PM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> Russell, to me, "snapshot" procedure is a perfect place to adopt this
> feature. After the implementation, we can use the "snapshot" procedure to
> snapshot a Hive table or an Iceberg table (maybe we can also make it
> generic to snapshot any other table, e.g. Delta).
>
> On Tue, May 9, 2023 at 10:00 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> How would Create Table Like, be different than our "Snapshot" procedure,
>> just enabled for Iceberg Tables? Wondering if we should just expand that
>> functionality.
>>
>> On Tue, May 9, 2023 at 11:54 AM Pucheng Yang <py...@pinterest.com.invalid>
>> wrote:
>>
>>> Ryan, when I mentioned "copy of the data'', I didn't mean to
>>> physically copy the data. I meant the copy of metadata and configuration
>>> such that the created table can also read the data that belongs to the
>>> table we created from. However, I do shared the concern that CREATE TABLE
>>> LIKE, if we plan to follow what most systems do, will copy some important
>>> configuration (such as gc.enabled) that I think we definitely don't want
>>> since it will create a surface for people to mess up the original table. In
>>> this regard, I agree we should adopt the approach of having a procedure
>>> instead. So I am dropping this CREATE TABLE LIKE feature request.
>>>
>>> Anton, branching will work but I will still prefer creating a separate
>>> table for these reasons: (1) I considered "branching" as a very advanced/
>>> new feature to my customers and it is generally easy and safe to just let
>>> them use a separate test table. (2) the new generated data will be placed
>>> under a separate location making auditing and clean up easier. (3) if we
>>> use branching, there is coordination between the user who is doing testing
>>> via branching and the platform who is constantly performing table
>>> maintenance, thus introducing frictions.
>>>
>>> On Thu, Apr 27, 2023 at 2:15 PM Anton Okolnychyi
>>> <aokolnyc...@apple.com.invalid> wrote:
>>>
>>>> Iceberg supports branching so that you can safely perform such tests
>>>> without any risk of corrupting the table. No need to create a separate
>>>> table and clone the config. Overall, I don’t think it is a good idea to
>>>> break the contract of CREATE TABLE LIKE.
>>>>
>>>> - Anton
>>>>
>>>> On Apr 27, 2023, at 11:59 AM, Pucheng Yang <py...@pinterest.com.INVALID>
>>>> wrote:
>>>>
>>>> Hi Anton,
>>>>
>>>> Yes, I want to branch the table state and reuse the data files, but for
>>>> test purposes only. Imagine if we want to test something related to reading
>>>> the Iceberg table or perform row level update.
>>>>
>>>> And I acknowledge the potential risk of the table state being
>>>> corrupted. So I am thinking we can consider adding these limitations when
>>>> running the "create table like":
>>>> (1) the created table should have "snapshot=true"
>>>> (2) the created table should have "gc.enabled=false" to make sure
>>>> existing files don't get messed up
>>>> (3) the created table should have a table location different then the
>>>> existing Iceberg table location it creates from
>>>> We can consider "create table like" as a snapshot action for an
>>>> existing Iceberg table, similar to the existing snapshot procedure we have
>>>> for an existing Hive table.
>>>>
>>>> I know CREATE TABLE LIKE is supposed to be copy reuse existing table
>>>> definition only. If we have concerns around messing up table state, I wish
>>>> we can break it down into the implementation and at least first implement
>>>> the part where we create tables without reusing the existing data files.
>>>>
>>>> On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi <
>>>> aokolnyc...@apple.com.invalid> wrote:
>>>>
>>>>> Pucheng, you mentioned you want to reuse existing data in the new
>>>>> table? Branching Iceberg table state can lead to unexpected situations as
>>>>> there will be multiple pointers in the catalog to the same state, which 
>>>>> can
>>>>> eventually corrupt the table. Isn’t CREATE TABLE LIKE supposed to just
>>>>> reuse the existing table definition without copying the data?
>>>>>
>>>>> - Anton
>>>>>
>>>>> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <borokna...@apache.org>
>>>>> wrote:
>>>>>
>>>>> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y
>>>>> for Iceberg tables.
>>>>> You can see various examples at
>>>>> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test
>>>>>
>>>>> - Zoltan
>>>>>
>>>>> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <b...@tabular.io> wrote:
>>>>>
>>>>>> You should be able to see how other DSv2 commands are written and
>>>>>> copy them. Look at Drop Table, maybe and see if you can copy the 
>>>>>> structure,
>>>>>> but instead of dropping, load the table and call createTable with its
>>>>>> metadata.
>>>>>>
>>>>>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang <
>>>>>> py...@pinterest.com.invalid> wrote:
>>>>>>
>>>>>>> Thanks Steve and Ryan for the reply.
>>>>>>>
>>>>>>> Steve, I am not looking for CTAS, my goal is to create an Iceberg
>>>>>>> table and reuse the existing data (same as the create table like 
>>>>>>> statement
>>>>>>> above). Also my question is not about specifying location in
>>>>>>> create statement.
>>>>>>>
>>>>>>> Ryan, the engine we are interested in is SparkSQL. Since you
>>>>>>> mentioned it is an easy fix, would you please share how that should be
>>>>>>> implemented such that anyone (maybe myself) interested in this can 
>>>>>>> explore
>>>>>>> the solution?
>>>>>>>
>>>>>>> Thanks both again.
>>>>>>>
>>>>>>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>>>
>>>>>>>> Pucheng, what engine are you interested in?
>>>>>>>>
>>>>>>>> This works fine in Trino: CREATE TABLE table_copy (LIKE
>>>>>>>> source_table INCLUDING PROPERTIES)
>>>>>>>>
>>>>>>>> I don’t know if it works in Hive, and last time I checked it was
>>>>>>>> not implemented for DSv2 in Spark. The Spark problem should be an easy 
>>>>>>>> fix.
>>>>>>>>
>>>>>>>> Ryan
>>>>>>>>
>>>>>>>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang <
>>>>>>>> hongyue_zh...@apple.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Hey Pengcheng,
>>>>>>>>>
>>>>>>>>>    Are you looking for CTAS as in
>>>>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select?
>>>>>>>>>  I
>>>>>>>>> think you can also specify explicit location as part of create 
>>>>>>>>> statement in
>>>>>>>>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Steve Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang <
>>>>>>>>> py...@pinterest.com.INVALID> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I wonder how folks in the community deal with the cases where you
>>>>>>>>> want to create a test table from an existing iceberg table? In Hive, 
>>>>>>>>> what
>>>>>>>>> we normally do is to run a query "create table x like y location z". 
>>>>>>>>> But we
>>>>>>>>> can't do this for the Iceberg table.
>>>>>>>>>
>>>>>>>>> If this is a feature that is missing, should we collaborate to
>>>>>>>>> build a similar feature?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Tabular
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>
>>>>>
>>>>

Re: Support create table like for Iceberg table?

Reply via email to