What is "this proposal"?
Le 17/04/2024 à 10:38, David Li a écrit :
Should I take it that this proposal is dead in the water? While we could define
our own Unknown/Other type for say the ADBC PostgreSQL driver it might be
useful to have a singular type for consumers to latch on to.
On Fri, Apr 12, 2024, at 07:32, David Li wrote:
I think an "Other" extension type is slightly different than an
arbitrary extension type, though: the latter may be understood
downstream but the former represents a point at which a component
explicitly declares it does not know how to handle a field. In this
example, the PostgreSQL ADBC driver might be able to provide a
representation regardless, but a different driver (or say, the JDBC
adapter, which cannot necessarily get a bytestring for an arbitrary
JDBC type) may want an Other type to signal that it would fail if asked
to provide particular columns.
On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
Depending where your Arrow-encoded data is used, either extension
types or generic field metadata are options. We have this problem in
the ADBC Postgres driver, where we can convert *most* Postgres types
to an Arrow type but there are some others where we can't or don't
know or don't implement a conversion. Currently for these we return
opaque binary (the Postgres COPY representation of the value) but put
field metadata so that a consumer can implement a workaround for an
unsupported type. It would be arguably better to have implemented this
as an extension type; however, field metadata felt like less of a
commitment when I first worked on this.
Cheers,
-dewey
On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
<norman.jor...@improving.com.invalid> wrote:
I was using UUID as an example. It looks like extension types covers my
original request.
________________________________
From: Felipe Oliveira Carvalho <felipe...@gmail.com>
Sent: Thursday, April 11, 2024 7:15 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: Unsupported/Other Type
The OP used UUID as an example. Would that be enough or the request is for
a flexible mechanism that allows the creation of one-off nominal types for
very specific use-cases?
—
Felipe
On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote:
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
Antoine.
Le 10/04/2024 à 22:55, Wes McKinney a écrit :
In the past we have discussed adding a canonical type for UUID and JSON.
I
still think this is a good idea and could improve ergonomics in
downstream
language bindings (e.g. by exposing JSON querying function or
automatically
boxing UUIDs in built-in UUID types, like the Python uuid library). Has
anyone done any work on this to anyone's knowledge?
On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:
Hi Norman,
Arrow has a concept of extension types [1] along with the possibility of
proposing new canonical extension types [2]. This seems to cover the
use-cases you mention but I might be misunderstanding?
Thanks,
Micah
[1]
https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
[2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
<norman.jor...@improving.com.invalid> wrote:
Problem Description
Currently Arrow schemas can only contain columns of types supported by
Arrow. In some cases an Arrow schema maps to an external schema. This
can
result in the Arrow schema not being able to support all the columns
from
the external schema.
Consider an external system that contains a column of type UUID. To
model
the schema in Arrow, the user has two choices:
1. Do not include the UUID column in the Arrow schema
2. Map the column to an existing Arrow type. This will not include
the
original type information. A UUID can be mapped to a FixedSizeBinary,
but
consumers of the Arrow schema will be unable to distinguish a
FixedSizeBinary field from a UUID field.
Possible Solution
* Add a new type code that represents unsupported types
* Values for the new type are represented as variable length
binary
Some drivers can expose data even when they don’t understand the data
type. For example, the PostgreSQL driver will return the raw bytes for
fields of an unknown type. Using an explicit type lets clients know
that
they should convert values if they were able to determine the actual
data
type.
Questions
* What is the impact on existing clients when they encounter
fields
of
the unsupported type?
* Is it safe to assume that all unsupported values can safely be
converted to a variable length binary?
* How can we preserve information about the original type?
Warning: The sender of this message could not be validated and may not be the
actual sender.