Re: Capability to create table without reassigning IDs

Ryan Blue Sun, 21 Aug 2022 12:07:01 -0700

Can you expand on that a bit more? How is a table temporary if you intend
to reuse its files in a different table? Is this something where you should
be using `REPLACE TABLE ... AS SELECT` instead?


On Sun, Aug 21, 2022 at 10:20 AM Walaa Eldin Moustafa <
wmoust...@linkedin.com> wrote:

> Thanks Ryan! The use case is dropping a temporary table and reusing its
> files in a new table. I think temporary tables could be a common use case.
> In addition, I think reassigning field IDs makes it harder to reuse
> schemas, but does not prevent it. I think, we can give the users the option
> and let them reuse the IDs if they know what they are doing. Probably the
> default behavior can be to reassign, but optionally this can be overridden?
>
> Thanks,
> Walaa.
>
> ------------------------------
> *From:* Ryan Blue <b...@tabular.io>
> *Sent:* Sunday, August 21, 2022 9:38 AM
> *To:* dev@iceberg.apache.org <dev@iceberg.apache.org>
> *Cc:* Walaa Eldin Moustafa <wmoust...@linkedin.com>; Vikram Bohra <
> vbo...@linkedin.com>; Sudarshan Vasudevan <suvasude...@linkedin.com>
> *Subject:* Re: Capability to create table without reassigning IDs
>
> Hi Raymond,
>
> One of the reasons why Iceberg doesn't currently support this is that it's
> dangerous to share files between tables. Even if you guarantee that a table
> has the same schema at some point in time, there's nothing stopping table
> schemas from diverging later. What are you trying to accomplish by creating
> a table with the same IDs? Are you migrating from one metastore to another?
> In that case, I'd recommend using `registerTable` instead.
>
> Ryan
>
> On Fri, Aug 19, 2022 at 2:22 PM Raymond Zhang <razh...@linkedin.com.invalid>
> wrote:
>
> Hi there,
>
>
>
> I’m Raymond from LinkedIn big data platform org.
>
>
>
> I have a question regarding the capability to create a new table without
> assigning new IDs in the schema. Currently, BaseMetastoreCatalog.create()
> calls the public TableMetadata.newTableMetadata() which then calls the
> package-private newTableMetadata() method. The package-private
> newTableMetadata() method takes in an Iceberg schema and always reassigns
> the ids in the schema to get a freshSchema and use that for creating the
> new TableMetadata. This means, currently when we create a table, the IDs
> will always be reassigned.
>
>
>
> I wonder if we can expose a possibility to create a table using the input
> Iceberg schema as-is (without freshly assigning ids to it). I have the
> following arguments to support this:
>
>
>
>    - It seems when an Iceberg schema is created, it’s already guaranteed
>    that the ids are consistent from creation. I tried to create a new Schema
>    with duplicate ids, and it fails at creation time, this means the creation
>    already takes care of ID consistency. So, I wonder if that reassign id step
>    really adds value to making the schema consistent.
>    - From a user perspective, if we introduce this new capability, we
>    will have a guaranteed way to create Iceberg tables with the ids we specif.
>    We then will be able to create Iceberg tables with identical schema (of
>    same ids), and thus their files can be reused between each other. A simple
>    use case is that we can directly use AppendFiles API to add files from one
>    table to the other without worrying their IDs discrepancies.
>
>
>
> Let me know how you think this might be beneficial, or I’m missing
> anything here?
>
>
>
> Thanks,
>
> Raymond
>
>
>
> --
> Ryan Blue
> Tabular
>


-- 
Ryan Blue
Tabular

Re: Capability to create table without reassigning IDs

Reply via email to