On Sat, May 18, 2019, 1:58 PM Wes McKinney <wesmck...@gmail.com> wrote:

> Hi Micah,
>
> The use cases I'm aware of are mostly coming from proprietary
> applications. My idea was for the extension metadata to be as unobtrusive
> as possible. The only alternative as I see it would be to have an Extension
> value in the Type union which would be more intrusive to applications
> handling data for which they have no special handling. That doesn't seem
> desirable if there are alternatives.
>

The other (3rd) option would be to add an extra member to Field. This is
also a bit more intrusive than having fields in the custom_metadata
dictionary.


> As an immediate use case we could use extension types to embed Tensor
> values in Binary arrays.
>
> Wes
>
> On Sat, May 18, 2019, 12:19 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Hi Wes,
>> This approach seems reasonable to me.  I'm a little concerned we haven't
>> validated many use-cases against the approach (but I don't see any obvious
>> flaws).
>>
>> Thanks,
>> Micah
>>
>> On Fri, May 17, 2019 at 5:16 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> > As Micah brought up, as part of this we would like to formalize the
>> > use of "ARROW:" as a reserved metadata key prefix. This is similar to
>> > Apache Avro which uses "avro." as a reserved prefix [1]. If someone
>> > has a different idea about what the prefix should be I'm open to other
>> > ideas
>> >
>> > [1] :
>> https://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files
>> >
>> > On Thu, May 16, 2019 at 7:29 PM Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> > >
>> > > hi folks,
>> > >
>> > > In a prior mailing list thread from February [1] I brought up some
>> > > work I'd done in C++ to create an API to define custom data types that
>> > > can be embedded in built-in Arrow logical types. These are serialized
>> > > through IPC by adding special fields to the `custom_metadata` member
>> > > of Field in the Flatbuffers metadata [2]. The idea is that if an
>> > > implementation does not understand the custom type, then they can
>> > > still interact with the underlying data if need be, or pass on the
>> > > extension metadata in subsequent IPC messages.
>> > >
>> > > David Li has put up a WIP PR to implement this for Java [4], so to
>> > > help the project move forward I think it's a good time to formalize
>> > > this, and if there are disagreements to hash them out now. I have just
>> > > opened a PR to the Arrow specification documents [3] that describes
>> > > the current state of C++ and also the WIP Java PR.
>> > >
>> > > Any thought about this? If there is consensus about this solution
>> > > approach then I can hold a vote.
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> > > [1]:
>> >
>> https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E
>> > > [2]:
>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L291
>> > > [3]: https://github.com/apache/arrow/pull/4332
>> > > [4]: https://github.com/apache/arrow/pull/4251
>> >
>>
>

Reply via email to