hi folks,

I recently wrote a patch to propose a C++ API for user-defined "extension" types

https://github.com/apache/arrow/pull/3694

The idea is that an extension type wraps a pre-existing Arrow type.
For example a UUIDType can be represented as FixedSizeBinary(16). The
intent is that Arrow consumers which are not aware of an extension
type can ignore the additional type metadata and still interact with
the raw storage

One question is how to permit such metadata to be preserved through
IPC / RPC messages (i.e., Schema.fbs) and how other languages can
interact with it. There are couple options:

* What I implemented in my patch: use the Field-level custom_metadata
field with known key names "arrow_extension_name" and
"arrow_extension_data" for the type's unique identifier and serialized
form, respectively. If we opt for this, then we should add a section
to the specification to codify the convention used

* Add a new field to the Field table in Schema.fbs

The former is attractive in the sense that consumers who don't have
special handling for an extension type will carry along the Field
metadata in their schema, so it can be passed on in subsequent IPC
messages without writing any extra code.

Thoughts about this? With a C++ implementation landing, it would be
great to identify a champion to create a Java implementation and also
add integration test support to ensure that consumers do not destroy
the extension type metadata for unrecognized types (i.e. if I send you
data that says it's "uuid" and you don't know what that is yet, you
preserve the metadata fields anyway).

Thanks
Wes

Reply via email to