Re: Unsupported/Other Type

David Li Wed, 17 Apr 2024 01:47:41 -0700

Should I take it that this proposal is dead in the water? While we could define 
our own Unknown/Other type for say the ADBC PostgreSQL driver it might be 
useful to have a singular type for consumers to latch on to.


On Fri, Apr 12, 2024, at 07:32, David Li wrote:
> I think an "Other" extension type is slightly different than an 
> arbitrary extension type, though: the latter may be understood 
> downstream but the former represents a point at which a component 
> explicitly declares it does not know how to handle a field. In this 
> example, the PostgreSQL ADBC driver might be able to provide a 
> representation regardless, but a different driver (or say, the JDBC 
> adapter, which cannot necessarily get a bytestring for an arbitrary 
> JDBC type) may want an Other type to signal that it would fail if asked 
> to provide particular columns.
>
> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>> Depending where your Arrow-encoded data is used, either extension
>> types or generic field metadata are options. We have this problem in
>> the ADBC Postgres driver, where we can convert *most* Postgres types
>> to an Arrow type but there are some others where we can't or don't
>> know or don't implement a conversion. Currently for these we return
>> opaque binary (the Postgres COPY representation of the value) but put
>> field metadata so that a consumer can implement a workaround for an
>> unsupported type. It would be arguably better to have implemented this
>> as an extension type; however, field metadata felt like less of a
>> commitment when I first worked on this.
>>
>> Cheers,
>>
>> -dewey
>>
>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>> <norman.jor...@improving.com.invalid> wrote:
>>>
>>> I was using UUID as an example. It looks like extension types covers my 
>>> original request.
>>> ________________________________
>>> From: Felipe Oliveira Carvalho <felipe...@gmail.com>
>>> Sent: Thursday, April 11, 2024 7:15 AM
>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
>>> Subject: Re: Unsupported/Other Type
>>>
>>> The OP used UUID as an example. Would that be enough or the request is for
>>> a flexible mechanism that allows the creation of one-off nominal types for
>>> very specific use-cases?
>>>
>>> —
>>> Felipe
>>>
>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote:
>>>
>>> >
>>> > Yes, JSON and UUID are obvious candidates for new canonical extension
>>> > types. XML also comes to mind, but I'm not sure there's much of a use
>>> > case for it.
>>> >
>>> > Regards
>>> >
>>> > Antoine.
>>> >
>>> >
>>> > Le 10/04/2024 à 22:55, Wes McKinney a écrit :
>>> > > In the past we have discussed adding a canonical type for UUID and JSON.
>>> > I
>>> > > still think this is a good idea and could improve ergonomics in
>>> > downstream
>>> > > language bindings (e.g. by exposing JSON querying function or
>>> > automatically
>>> > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
>>> > > anyone done any work on this to anyone's knowledge?
>>> > >
>>> > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
>>> > > wrote:
>>> > >
>>> > >> Hi Norman,
>>> > >> Arrow has a concept of extension types [1] along with the possibility 
>>> > >> of
>>> > >> proposing new canonical extension types [2].  This seems to cover the
>>> > >> use-cases you mention but I might be misunderstanding?
>>> > >>
>>> > >> Thanks,
>>> > >> Micah
>>> > >>
>>> > >> [1]
>>> > >>
>>> > >>
>>> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
>>> > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>>> > >>
>>> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
>>> > >> <norman.jor...@improving.com.invalid> wrote:
>>> > >>
>>> > >>> Problem Description
>>> > >>>
>>> > >>> Currently Arrow schemas can only contain columns of types supported by
>>> > >>> Arrow. In some cases an Arrow schema maps to an external schema. This
>>> > can
>>> > >>> result in the Arrow schema not being able to support all the columns
>>> > from
>>> > >>> the external schema.
>>> > >>>
>>> > >>> Consider an external system that contains a column of type UUID. To
>>> > model
>>> > >>> the schema in Arrow, the user has two choices:
>>> > >>>
>>> > >>>    1.  Do not include the UUID column in the Arrow schema
>>> > >>>
>>> > >>>    2.  Map the column to an existing Arrow type. This will not include
>>> > the
>>> > >>> original type information. A UUID can be mapped to a FixedSizeBinary,
>>> > but
>>> > >>> consumers of the Arrow schema will be unable to distinguish a
>>> > >>> FixedSizeBinary field from a UUID field.
>>> > >>>
>>> > >>> Possible Solution
>>> > >>>
>>> > >>>    *   Add a new type code that represents unsupported types
>>> > >>>
>>> > >>>    *   Values for the new type are represented as variable length
>>> > binary
>>> > >>>
>>> > >>> Some drivers can expose data even when they don’t understand the data
>>> > >>> type. For example, the PostgreSQL driver will return the raw bytes for
>>> > >>> fields of an unknown type. Using an explicit type lets clients know
>>> > that
>>> > >>> they should convert values if they were able to determine the actual
>>> > data
>>> > >>> type.
>>> > >>>
>>> > >>> Questions
>>> > >>>
>>> > >>>    *   What is the impact on existing clients when they encounter
>>> > fields
>>> > >> of
>>> > >>> the unsupported type?
>>> > >>>
>>> > >>>    *   Is it safe to assume that all unsupported values can safely be
>>> > >>> converted to a variable length binary?
>>> > >>>
>>> > >>>    *   How can we preserve information about the original type?
>>> > >>>
>>> > >>>
>>> > >>
>>> > >
>>> >
>>> Warning: The sender of this message could not be validated and may not be 
>>> the actual sender.

Re: Unsupported/Other Type

Reply via email to