[GitHub] mccheah opened a new issue #16: Custom metadata in data files

GitBox Mon, 26 Nov 2018 18:16:52 -0800

mccheah opened a new issue #16: Custom metadata in data files
URL: https://github.com/apache/incubator-iceberg/issues/16
 
 
   (Migrated from https://github.com/Netflix/iceberg/issues/106 with some extra 
details added)
   
   It would be useful for consumers of Iceberg tables to be able to specify 
additional metadata in data files that enable them to know how to read the 
files. Some examples of custom metadata include:
   
   * Encryption keys required to read the file,
   * Compression codecs specified on the file without needing to have a 
specific file extension,
   * Metadata that's specific to a custom file format. Suppose we supported CSV 
tables in Iceberg down the road. It would be nice to attach the column 
delimiter on a per-file basis so that a table can be comprised of multiple 
files that may not necessarily be uniform in terms of the exact layout, but 
have compatible schemas.
   
   The custom metadata field should be of type `Map<String, String>` and can be 
an optional column.
   
   Finally, consider the I/O submodule proposed in 
https://github.com/apache/incubator-iceberg/issues/12. In `FileIO` there, the 
`newOutputFile` API should also return custom metadata specific to reading that 
file after it's written. Thus `FileIO#newOutputFile` should return a struct 
containing an `OutputFile` object for writing the bytes and a `Map<String, 
String>` collection of metadata to be saved in the manifest after the data file 
is written.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] mccheah opened a new issue #16: Custom metadata in data files

Reply via email to