Re: [DISCUSS][Arrow] Extension metadata encoding design

Antoine Pitrou Wed, 16 Aug 2023 08:18:59 -0700


Hi Jeremy,

A single key makes it easier for generic code to recreate extensiontypes it does not know about.


Here is an example in the C++ IPC layer:
https://github.com/apache/arrow/blob/641201416c1075edfd05d78b539275065daac31d/cpp/src/arrow/ipc/metadata_internal.cc#L823-L845

Here is similar logic in the C++ bridge for the C Data Interface:
https://github.com/apache/arrow/blob/641201416c1075edfd05d78b539275065daac31d/cpp/src/arrow/c/bridge.cc#L1021-L1029

It is probably expected that many extension types will be parameter-less(such as UUID, JSON, BSON...).

It does imply that extension types with sophisticated parameterizationmust implement a custom (de)serialization mechanism themselves. I'm notsure this tradeoff was discussed at the time, perhaps other people (Wes?Jacques?) may chime in.


Regards

Antoine.



Le 16/08/2023 à 16:32, Jeremy Leibs a écrit :

Hello,

I've recently started working with extension types as part of our project
and I was surprised to discover that extension types are required to pack
all of their own metadata into a single string value of the
"ARROW:extension:metadata" key.

In turn this then means we have to endure arbitrary unstructured /
hard-to-validate strings with custom encodings (e.g. JSON inside
flatbuffer) when dealing with extensions.

Can anyone provide some context on the rationale for this design decision?

Given that we already have (1) a perfectly good metadata keyvalue store
already in place, and (2) established recommendations for
namespaced scoping of keys, why would we not just use that to store the
metadata for the extension. For example:

"ARROW:extension:name": "myorg.myextension",
"myorg:myextension:meta1": "value1",
"myorg:myextension:meta2": "value2",

Thanks for any insights,
Jeremy

Re: [DISCUSS][Arrow] Extension metadata encoding design

Reply via email to