rdblue opened a new issue #41: Add an API to maintain external schema mappings
URL: https://github.com/apache/incubator-iceberg/issues/41
 
 
   Once Iceberg supports external schema mappings (#40), it should also support 
an easy way to maintain those mappings by notifying Iceberg when an external 
schema changes. Iceberg would update its mapping when notified.
   
   For example, starting with this mapping:
   
   ```json
   [ {"field-id": 1, "names": ["id"]},
     {"field-id": 2, "names": ["data"]} ]
   ```
   
   Consider a new Avro schema registered that changes the name `id` to `obj_id` 
and adds a `ts` field. Iceberg would add an un-mapped entry for `ts` and add 
`obj_id` to the `id` mapping based on the Avro schema's field alias that 
indicates `id` and `obj_id` are the same field. The updated mapping would be:
   
   ```json
   [ {"field-id": 1, "names": ["obj_id", "id"]},
     {"field-id": 2, "names": ["data"]},
     {"names": ["ts"]} ]
   ```
   
   Next, if the Iceberg table schema is updated to add `ts`, the mapping would 
be updated by matching the new Iceberg column to the unmatched mapping entry to 
produce this mapping:
   
   ```json
   [ {"field-id": 1, "names": ["obj_id", "id"]},
     {"field-id": 2, "names": ["data"]},
     {"field-id": 3, "names": ["ts"]} ]
   ```
   
   This would maintain compatibility with new Avro data files without making 
changes to the Iceberg table other than the mapping. Columns can be added in 
Iceberg or Avro first and the mapping is completed by column name when it is 
added in both schemas.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to