Can you expand on that a bit more? How is a table temporary if you intend to reuse its files in a different table? Is this something where you should be using `REPLACE TABLE ... AS SELECT` instead?
On Sun, Aug 21, 2022 at 10:20 AM Walaa Eldin Moustafa < wmoust...@linkedin.com> wrote: > Thanks Ryan! The use case is dropping a temporary table and reusing its > files in a new table. I think temporary tables could be a common use case. > In addition, I think reassigning field IDs makes it harder to reuse > schemas, but does not prevent it. I think, we can give the users the option > and let them reuse the IDs if they know what they are doing. Probably the > default behavior can be to reassign, but optionally this can be overridden? > > Thanks, > Walaa. > > ------------------------------ > *From:* Ryan Blue <b...@tabular.io> > *Sent:* Sunday, August 21, 2022 9:38 AM > *To:* dev@iceberg.apache.org <dev@iceberg.apache.org> > *Cc:* Walaa Eldin Moustafa <wmoust...@linkedin.com>; Vikram Bohra < > vbo...@linkedin.com>; Sudarshan Vasudevan <suvasude...@linkedin.com> > *Subject:* Re: Capability to create table without reassigning IDs > > Hi Raymond, > > One of the reasons why Iceberg doesn't currently support this is that it's > dangerous to share files between tables. Even if you guarantee that a table > has the same schema at some point in time, there's nothing stopping table > schemas from diverging later. What are you trying to accomplish by creating > a table with the same IDs? Are you migrating from one metastore to another? > In that case, I'd recommend using `registerTable` instead. > > Ryan > > On Fri, Aug 19, 2022 at 2:22 PM Raymond Zhang <razh...@linkedin.com.invalid> > wrote: > > Hi there, > > > > I’m Raymond from LinkedIn big data platform org. > > > > I have a question regarding the capability to create a new table without > assigning new IDs in the schema. Currently, BaseMetastoreCatalog.create() > calls the public TableMetadata.newTableMetadata() which then calls the > package-private newTableMetadata() method. The package-private > newTableMetadata() method takes in an Iceberg schema and always reassigns > the ids in the schema to get a freshSchema and use that for creating the > new TableMetadata. This means, currently when we create a table, the IDs > will always be reassigned. > > > > I wonder if we can expose a possibility to create a table using the input > Iceberg schema as-is (without freshly assigning ids to it). I have the > following arguments to support this: > > > > - It seems when an Iceberg schema is created, it’s already guaranteed > that the ids are consistent from creation. I tried to create a new Schema > with duplicate ids, and it fails at creation time, this means the creation > already takes care of ID consistency. So, I wonder if that reassign id step > really adds value to making the schema consistent. > - From a user perspective, if we introduce this new capability, we > will have a guaranteed way to create Iceberg tables with the ids we specif. > We then will be able to create Iceberg tables with identical schema (of > same ids), and thus their files can be reused between each other. A simple > use case is that we can directly use AppendFiles API to add files from one > table to the other without worrying their IDs discrepancies. > > > > Let me know how you think this might be beneficial, or I’m missing > anything here? > > > > Thanks, > > Raymond > > > > -- > Ryan Blue > Tabular > -- Ryan Blue Tabular