mccheah opened a new issue #23: File identifiers and aliases
URL: https://github.com/apache/incubator-iceberg/issues/23
 
 
   A URI locating a file may not be enough for file I/O implementations to 
construct `InputFile` and `OutputFile` instances, as proposed in 
https://github.com/apache/incubator-iceberg/issues/12. More specifically, 
consider a system where a file has some path, but that same path can be 
namespaced in different contexts. For example, the metadata for that same file 
can evolve over time, as we discussed in 
https://github.com/apache/incubator-iceberg/issues/16.
   
   We propose adding another field called an `ExternalIdentifier` to the 
`DataFile` schema, which is an optional String tag allowing custom Iceberg 
consumers to look up the file in their system using their own unique 
identification system. This would allow such systems to look up the file 
directly by the identifier in addition to the path.
   
   Alternative representations for the `ExternalIdentifier` that would allow 
for richer representations could be a byte blob or a `struct` with some schema 
that's stored in the table properties. However those representations can 
encourage more arbitrary and uncontrolled use of the field which we probably 
want to avoid. `String` seems to be the safest option.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to