Re: Unsupported/Other Type

Antoine Pitrou Wed, 17 Apr 2024 07:36:22 -0700


I think this should be:
- a canonical extension type

- with a parameter unambiguously identifying the type for applicationssupporting it (such as "org.postgres.pg_lsn")- with storage type left for each implementation to decide, but with arecommendation to use either 1) binary, 2) fixed-size-binary or 3) null.


Regards

Antoine.


Le 17/04/2024 à 16:25, Weston Pace a écrit :

people generally find use in Arrow schemas independently of concrete data.


This makes sense.  I think we do want to encourage use of Arrow as a "type
system" even if there is no data involved.  And, given that we cannot
easily change a field's data type property to "optional" it makes sense to
use a dedicated type and I so I would be in favor of such a proposal (we
may eventually add an "unknown type" concept in Substrait as well, it's
come up several times, and so we could use this in that context).

I think that I would still prefer a canonical extension type (with storage
type null) over a new dedicated type.

On Wed, Apr 17, 2024 at 5:39 AM Antoine Pitrou <[email protected]> wrote:


Ah! Well, I think this could be an interesting proposal, but someone
should put a more formal proposal, perhaps as a draft PR.

Regards

Antoine.


Le 17/04/2024 à 11:57, David Li a écrit :

For an unsupported/other extension type.

On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:

What is "this proposal"?


Le 17/04/2024 à 10:38, David Li a écrit :

Should I take it that this proposal is dead in the water? While we

could define our own Unknown/Other type for say the ADBC PostgreSQL driver
it might be useful to have a singular type for consumers to latch on to.


On Fri, Apr 12, 2024, at 07:32, David Li wrote:

I think an "Other" extension type is slightly different than an
arbitrary extension type, though: the latter may be understood
downstream but the former represents a point at which a component
explicitly declares it does not know how to handle a field. In this
example, the PostgreSQL ADBC driver might be able to provide a
representation regardless, but a different driver (or say, the JDBC
adapter, which cannot necessarily get a bytestring for an arbitrary
JDBC type) may want an Other type to signal that it would fail if

asked

to provide particular columns.

On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:

Depending where your Arrow-encoded data is used, either extension
types or generic field metadata are options. We have this problem in
the ADBC Postgres driver, where we can convert *most* Postgres types
to an Arrow type but there are some others where we can't or don't
know or don't implement a conversion. Currently for these we return
opaque binary (the Postgres COPY representation of the value) but put
field metadata so that a consumer can implement a workaround for an
unsupported type. It would be arguably better to have implemented

this

as an extension type; however, field metadata felt like less of a
commitment when I first worked on this.

Cheers,

-dewey

On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
<[email protected]> wrote:


I was using UUID as an example. It looks like extension types

covers my original request.

________________________________
From: Felipe Oliveira Carvalho <[email protected]>
Sent: Thursday, April 11, 2024 7:15 AM
To: [email protected] <[email protected]>
Subject: Re: Unsupported/Other Type

The OP used UUID as an example. Would that be enough or the request

is for

a flexible mechanism that allows the creation of one-off nominal

types for

very specific use-cases?

—
Felipe

On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <[email protected]>

wrote:


Yes, JSON and UUID are obvious candidates for new canonical

extension

types. XML also comes to mind, but I'm not sure there's much of a

use

case for it.

Regards

Antoine.


Le 10/04/2024 à 22:55, Wes McKinney a écrit :

In the past we have discussed adding a canonical type for UUID

and JSON.

still think this is a good idea and could improve ergonomics in

downstream

language bindings (e.g. by exposing JSON querying function or

automatically

boxing UUIDs in built-in UUID types, like the Python uuid

library). Has

anyone done any work on this to anyone's knowledge?

On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <

[email protected]>

wrote:

Hi Norman,
Arrow has a concept of extension types [1] along with the

possibility of

proposing new canonical extension types [2].  This seems to

cover the

use-cases you mention but I might be misunderstanding?

Thanks,
Micah

[1]

https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types

[2]

https://arrow.apache.org/docs/format/CanonicalExtensions.html


On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
<[email protected]> wrote:

Problem Description

Currently Arrow schemas can only contain columns of types

supported by

Arrow. In some cases an Arrow schema maps to an external

schema. This

can

result in the Arrow schema not being able to support all the

columns

from

the external schema.

Consider an external system that contains a column of type

UUID. To

model

the schema in Arrow, the user has two choices:

      1.  Do not include the UUID column in the Arrow schema

      2.  Map the column to an existing Arrow type. This will

not include

the

original type information. A UUID can be mapped to a

FixedSizeBinary,

but

consumers of the Arrow schema will be unable to distinguish a
FixedSizeBinary field from a UUID field.

Possible Solution

      *   Add a new type code that represents unsupported types

      *   Values for the new type are represented as variable

length

binary


Some drivers can expose data even when they don’t understand

the data

type. For example, the PostgreSQL driver will return the raw

bytes for

fields of an unknown type. Using an explicit type lets clients

know

that

they should convert values if they were able to determine the

actual

data

type.

Questions

      *   What is the impact on existing clients when they

encounter

fields

of

the unsupported type?

      *   Is it safe to assume that all unsupported values can

safely be

converted to a variable length binary?

      *   How can we preserve information about the original

type?

Warning: The sender of this message could not be validated and may

not be the actual sender.

Re: Unsupported/Other Type

Reply via email to