Re: Support create table like for Iceberg table?

Anton Okolnychyi Thu, 27 Apr 2023 14:15:54 -0700

Iceberg supports branching so that you can safely perform such tests without 
any risk of corrupting the table. No need to create a separate table and clone 
the config. Overall, I don’t think it is a good idea to break the contract of 
CREATE TABLE LIKE.


- Anton

> On Apr 27, 2023, at 11:59 AM, Pucheng Yang <py...@pinterest.com.INVALID> 
> wrote:
> 
> Hi Anton, 
> 
> Yes, I want to branch the table state and reuse the data files, but for test 
> purposes only. Imagine if we want to test something related to reading the 
> Iceberg table or perform row level update. 
> 
> And I acknowledge the potential risk of the table state being corrupted. So I 
> am thinking we can consider adding these limitations when running the "create 
> table like":
> (1) the created table should have "snapshot=true" 
> (2) the created table should have "gc.enabled=false" to make sure existing 
> files don't get messed up 
> (3) the created table should have a table location different then the 
> existing Iceberg table location it creates from
> We can consider "create table like" as a snapshot action for an existing 
> Iceberg table, similar to the existing snapshot procedure we have for an 
> existing Hive table.
> 
> I know CREATE TABLE LIKE is supposed to be copy reuse existing table 
> definition only. If we have concerns around messing up table state, I wish we 
> can break it down into the implementation and at least first implement the 
> part where we create tables without reusing the existing data files.
> 
> On Wed, Apr 26, 2023 at 8:26 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> Pucheng, you mentioned you want to reuse existing data in the new table? 
> Branching Iceberg table state can lead to unexpected situations as there will 
> be multiple pointers in the catalog to the same state, which can eventually 
> corrupt the table. Isn’t CREATE TABLE LIKE supposed to just reuse the 
> existing table definition without copying the data?
> 
> - Anton
> 
>> On Apr 26, 2023, at 5:41 AM, Zoltán Borók-Nagy <borokna...@apache.org 
>> <mailto:borokna...@apache.org>> wrote:
>> 
>> As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for 
>> Iceberg tables.
>> You can see various examples at 
>> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test
>>  
>> <https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test>
>> 
>> - Zoltan
>> 
>> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue <b...@tabular.io 
>> <mailto:b...@tabular.io>> wrote:
>> You should be able to see how other DSv2 commands are written and copy them. 
>> Look at Drop Table, maybe and see if you can copy the structure, but instead 
>> of dropping, load the table and call createTable with its metadata.
>> 
>> On Tue, Apr 25, 2023 at 4:42 PM Pucheng Yang <py...@pinterest.com.invalid 
>> <mailto:py...@pinterest.com.invalid>> wrote:
>> Thanks Steve and Ryan for the reply.
>> 
>> Steve, I am not looking for CTAS, my goal is to create an Iceberg table and 
>> reuse the existing data (same as the create table like statement above). 
>> Also my question is not about specifying location in create statement.
>> 
>> Ryan, the engine we are interested in is SparkSQL. Since you mentioned it is 
>> an easy fix, would you please share how that should be implemented such that 
>> anyone (maybe myself) interested in this can explore the solution?
>> 
>> Thanks both again.
>> 
>> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue <b...@tabular.io 
>> <mailto:b...@tabular.io>> wrote:
>> Pucheng, what engine are you interested in?
>> 
>> This works fine in Trino: CREATE TABLE table_copy (LIKE source_table 
>> INCLUDING PROPERTIES)
>> 
>> I don’t know if it works in Hive, and last time I checked it was not 
>> implemented for DSv2 in Spark. The Spark problem should be an easy fix.
>> 
>> Ryan
>> 
>> 
>> On Tue, Apr 25, 2023 at 2:43 PM Steve Zhang <hongyue_zh...@apple.com.invalid 
>> <mailto:hongyue_zh...@apple.com.invalid>> wrote:
>> Hey Pengcheng, 
>> 
>>    Are you looking for CTAS as in 
>> https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select? 
>> <https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select?> 
>> I think you can also specify explicit location as part of create statement 
>> in https://iceberg.apache.org/docs/latest/spark-ddl/#create-table 
>> <https://iceberg.apache.org/docs/latest/spark-ddl/#create-table>
>> 
>> Thanks,
>> Steve Zhang
>> 
>> 
>> 
>>> On Apr 25, 2023, at 1:46 PM, Pucheng Yang <py...@pinterest.com.INVALID 
>>> <mailto:py...@pinterest.com.INVALID>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I wonder how folks in the community deal with the cases where you want to 
>>> create a test table from an existing iceberg table? In Hive, what we 
>>> normally do is to run a query "create table x like y location z". But we 
>>> can't do this for the Iceberg table.
>>> 
>>> If this is a feature that is missing, should we collaborate to build a 
>>> similar feature?
>>> 
>>> Thanks
>> 
>> 
>> 
>> -- 
>> Ryan Blue
>> Tabular
>> 
>> 
>> -- 
>> Ryan Blue
>> Tabular
>

Re: Support create table like for Iceberg table?

Reply via email to