Capability to create table without reassigning IDs

Raymond Zhang Fri, 19 Aug 2022 14:22:22 -0700

Hi there,

I’m Raymond from LinkedIn big data platform org.


I have a question regarding the capability to create a new table without 
assigning new IDs in the schema. Currently, BaseMetastoreCatalog.create() calls 
the public TableMetadata.newTableMetadata() which then calls the 
package-private newTableMetadata() method. The package-private 
newTableMetadata() method takes in an Iceberg schema and always reassigns the 
ids in the schema to get a freshSchema and use that for creating the new 
TableMetadata. This means, currently when we create a table, the IDs will 
always be reassigned.

I wonder if we can expose a possibility to create a table using the input 
Iceberg schema as-is (without freshly assigning ids to it). I have the 
following arguments to support this:


  *   It seems when an Iceberg schema is created, it’s already guaranteed that 
the ids are consistent from creation. I tried to create a new Schema with 
duplicate ids, and it fails at creation time, this means the creation already 
takes care of ID consistency. So, I wonder if that reassign id step really adds 
value to making the schema consistent.
  *   From a user perspective, if we introduce this new capability, we will 
have a guaranteed way to create Iceberg tables with the ids we specif. We then 
will be able to create Iceberg tables with identical schema (of same ids), and 
thus their files can be reused between each other. A simple use case is that we 
can directly use AppendFiles API to add files from one table to the other 
without worrying their IDs discrepancies.

Let me know how you think this might be beneficial, or I’m missing anything 
here?

Thanks,
Raymond

Capability to create table without reassigning IDs

Reply via email to