On this note, in Python we should probably re-evaluate the data
structure returned when accessing the "metadata" field.

On Wed, Mar 11, 2020 at 12:42 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> In the C++ library at least, uniqueness is never asserted when reading
> and writing the IPC metadata [1] [2]. If you use
> KeyValueMetadata::FindKey and the keys are non-unique, it will return
> the first one it finds. KeyValueMetadata::Merge assumes uniqueness,
> and the KeyValueMetadata::ToUnorderedMap function will drop all but
> one duplicate.
>
> In Parquet, the metadata is also a list of KeyValue pairs with no
> qualifications [3]
>
> My weak preference is to leave it to applications to make assertions
> about uniqueness. In either case since the metadata is ordered in the
> integration tests it would make sense to serialize as a list of
> key/value pairs like {"key": $key, "value": $value}
>
> [1]: 
> https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/metadata_internal.cc#L463
> [2]: 
> https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/metadata_internal.cc#L471
> [3]: 
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L728
>
> On Wed, Mar 11, 2020 at 12:11 PM Ben Kietzman <ben.kietz...@rstudio.com> 
> wrote:
> >
> > While working on https://issues.apache.org/jira/browse/ARROW-2255
> > (serialize custom_metadata in the integration tests), we had the following
> > discussion on GitHub:
> > https://github.com/apache/arrow/pull/6556#pullrequestreview-372405940
> >
> > In short, although in Schema.fbs custom_metadata is declared as an array of
> > KeyValue pairs (so duplicate keys would be possible), all reference
> > implementations assume it to represent an associative map with unique keys.
> >
> > Is there a use case for duplicate metadata keys? It seems that an
> > acceptable resolution might be to note in Schema.fbs that implementations
> > are allowed to assume that keys are unique
> >
> > Ben

Reply via email to