On Sat, May 18, 2019, 1:58 PM Wes McKinney <wesmck...@gmail.com> wrote:
> Hi Micah, > > The use cases I'm aware of are mostly coming from proprietary > applications. My idea was for the extension metadata to be as unobtrusive > as possible. The only alternative as I see it would be to have an Extension > value in the Type union which would be more intrusive to applications > handling data for which they have no special handling. That doesn't seem > desirable if there are alternatives. > The other (3rd) option would be to add an extra member to Field. This is also a bit more intrusive than having fields in the custom_metadata dictionary. > As an immediate use case we could use extension types to embed Tensor > values in Binary arrays. > > Wes > > On Sat, May 18, 2019, 12:19 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Hi Wes, >> This approach seems reasonable to me. I'm a little concerned we haven't >> validated many use-cases against the approach (but I don't see any obvious >> flaws). >> >> Thanks, >> Micah >> >> On Fri, May 17, 2019 at 5:16 AM Wes McKinney <wesmck...@gmail.com> wrote: >> >> > As Micah brought up, as part of this we would like to formalize the >> > use of "ARROW:" as a reserved metadata key prefix. This is similar to >> > Apache Avro which uses "avro." as a reserved prefix [1]. If someone >> > has a different idea about what the prefix should be I'm open to other >> > ideas >> > >> > [1] : >> https://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files >> > >> > On Thu, May 16, 2019 at 7:29 PM Wes McKinney <wesmck...@gmail.com> >> wrote: >> > > >> > > hi folks, >> > > >> > > In a prior mailing list thread from February [1] I brought up some >> > > work I'd done in C++ to create an API to define custom data types that >> > > can be embedded in built-in Arrow logical types. These are serialized >> > > through IPC by adding special fields to the `custom_metadata` member >> > > of Field in the Flatbuffers metadata [2]. The idea is that if an >> > > implementation does not understand the custom type, then they can >> > > still interact with the underlying data if need be, or pass on the >> > > extension metadata in subsequent IPC messages. >> > > >> > > David Li has put up a WIP PR to implement this for Java [4], so to >> > > help the project move forward I think it's a good time to formalize >> > > this, and if there are disagreements to hash them out now. I have just >> > > opened a PR to the Arrow specification documents [3] that describes >> > > the current state of C++ and also the WIP Java PR. >> > > >> > > Any thought about this? If there is consensus about this solution >> > > approach then I can hold a vote. >> > > >> > > Thanks >> > > Wes >> > > >> > > [1]: >> > >> https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E >> > > [2]: >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L291 >> > > [3]: https://github.com/apache/arrow/pull/4332 >> > > [4]: https://github.com/apache/arrow/pull/4251 >> > >> >