Hi Raymond, One of the reasons why Iceberg doesn't currently support this is that it's dangerous to share files between tables. Even if you guarantee that a table has the same schema at some point in time, there's nothing stopping table schemas from diverging later. What are you trying to accomplish by creating a table with the same IDs? Are you migrating from one metastore to another? In that case, I'd recommend using `registerTable` instead.
Ryan On Fri, Aug 19, 2022 at 2:22 PM Raymond Zhang <razh...@linkedin.com.invalid> wrote: > Hi there, > > > > I’m Raymond from LinkedIn big data platform org. > > > > I have a question regarding the capability to create a new table without > assigning new IDs in the schema. Currently, BaseMetastoreCatalog.create() > calls the public TableMetadata.newTableMetadata() which then calls the > package-private newTableMetadata() method. The package-private > newTableMetadata() method takes in an Iceberg schema and always reassigns > the ids in the schema to get a freshSchema and use that for creating the > new TableMetadata. This means, currently when we create a table, the IDs > will always be reassigned. > > > > I wonder if we can expose a possibility to create a table using the input > Iceberg schema as-is (without freshly assigning ids to it). I have the > following arguments to support this: > > > > - It seems when an Iceberg schema is created, it’s already guaranteed > that the ids are consistent from creation. I tried to create a new Schema > with duplicate ids, and it fails at creation time, this means the creation > already takes care of ID consistency. So, I wonder if that reassign id step > really adds value to making the schema consistent. > - From a user perspective, if we introduce this new capability, we > will have a guaranteed way to create Iceberg tables with the ids we specif. > We then will be able to create Iceberg tables with identical schema (of > same ids), and thus their files can be reused between each other. A simple > use case is that we can directly use AppendFiles API to add files from one > table to the other without worrying their IDs discrepancies. > > > > Let me know how you think this might be beneficial, or I’m missing > anything here? > > > > Thanks, > > Raymond > -- Ryan Blue Tabular