> The reason why I am being nit-picky here is I think that having a first
class type indicates that it should eventually be supported by all
reference implementations.  An "well known" extension type I think offers
less guarantees which makes it seem more suitable for niche types.

What are the requirements imposed on downstream projects by adding new types
such as Complex Numbers and Intervals? Hypothetically, does a new
first-class
type impose a requirement to provide full support for it downstream?
In other words, does full support include an understanding of the
representation
(i.e. an Arrow Type) *and* expressions
on the representation. This does seem onerous.

Or, does adding a type simply involve exposing a new Arrow Type (the
representation)
in the respective language (C++/Java/Rust) that downstream projects may
choose to support or ignore?
Java/Rust may not have a native Complex Type for example, but this isn't
Arrow's responsibility -- it simply provides it's own
type that the language/project should interpret.
For example, cuDF [1] performs a switch on the arrow types
and fails when encountering a type it doesn't understand (including
extension types).

[1]:
https://github.com/rapidsai/cudf/blob/306ae4ffe584fdf50114875f64ba552f496e13fa/cpp/src/interop/from_arrow.cu#L41-L87
Practically speaking, taking cuDF as an example, the handling might change
as follows:

switch (arrow_type.id()) {
   case arrow::Type::FLOAT:
       ...
       break;
   ...
   case arrow::Type::EXTENSION:
       auto name = static_cast<const
ExtensionType&>(arrow_type->type).extension_name();

       switch(name) {
           case "complex_float":
               ....
               break;
           case "complex_double":
               ....
               break;
           default:
                 CUDF_FAIL("Unsupported Extension Type")
        }
    default:
                 CUDF_FAIL("Unsupported Type");
}

Thus, practically speaking, handling of a First-Class Type vs an Extension
Type involves a multi-level switch statement.

> > > We could certainly choose to treat the type as "first class" in the
C++ library without it being
"top level" in the Type union in Flatbuffers.

> > My understanding is that it means having COMPLEX as an entry in the
> > arrow/type_fwd.h Type enum. I agree this would make implementation
> > work in the C++ library much more straightforward.

> > One idea I proposed would be to do that, and implement the
> > serialization of the complex metadata using Extension types.

> If this is a maintainable strategy for Canonical types it sounds good to
me.

Based on the example above, handling of Canonical Extension Type's
will add an extra layer of indirection in Type Identification logic.
Are downstream projects simply able to fail or ignore first-class types
they don't support in any case?

I think what's not clear to me is the contract between the Arrow API and
downstream projects that use the API. Are downstream projects obligated
to respect all first-class types?

Simon




On Fri, Jun 11, 2021 at 1:20 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> >
> > My understanding is that it means having COMPLEX as an entry in the
> > arrow/type_fwd.h Type enum. I agree this would make implementation
> > work in the C++ library much more straightforward.
>
> One idea I proposed would be to do that, and implement the
> > serialization of the complex metadata using Extension types.
>
>
> If this is a maintainable strategy for Canonical types it sounds good to
> me.
>
> On Thu, Jun 10, 2021 at 4:02 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > My understanding is that it means having COMPLEX as an entry in the
> > arrow/type_fwd.h Type enum. I agree this would make implementation
> > work in the C++ library much more straightforward.
> >
> > One idea I proposed would be to do that, and implement the
> > serialization of the complex metadata using Extension types.
> >
> > On Thu, Jun 10, 2021 at 5:47 PM Weston Pace <weston.p...@gmail.com>
> wrote:
> > >
> > > > While dedicated types are not strictly required, compute functions
> > would
> > > > be much easier to add for a first-class dedicated complex datatype
> > > > rather than for an extension type.
> > > @pitrou
> > >
> > > This is perhaps a naive question (and admittedly, I'm not up to speed
> > > on my compute kernels) but why is this the case?  For example, if
> > > adding a complex addition kernel it seems we would be talking about...
> > >
> > > dest_scalar.real = scalar1.real + scalar2.real;
> > > dest_scalar.im = scalar1.im + scalar2.im;
> > >
> > > vs...
> > >
> > > dest_scalar[0] = scalar1[0] + scalar2[0];
> > > dest_scalar[1] = scalar1[1] + scalar2[1];
> > >
> > > On Thu, Jun 10, 2021 at 11:27 AM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > >
> > > > I'd be supportive of starting with this as a "canonical" extension
> > > > type so that all implementations are not expected to support complex
> > > > types — this would encourage us to build sufficient integration e.g.
> > > > with NumPy to get things working end-to-end with the on-wire
> > > > representation being an extension type. We could certainly choose to
> > > > treat the type as "first class" in the C++ library without it being
> > > > "top level" in the Type union in Flatbuffers.
> > > >
> > > > I agree that the use cases are more specialized, and the fact that we
> > > > haven't needed it until now (or at least, its absence suggests this)
> > > > shows that this is the case.
> > > >
> > > > On Thu, Jun 10, 2021 at 4:17 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > wrote:
> > > > >
> > > > > >
> > > > > > I'm convinced now that  first-class types seem to be the way to
> go
> > and I'm
> > > > > > happy to take this approach.
> > > > >
> > > > > I agree from an implementation effort it is simpler, but I'm still
> > not
> > > > > convinced that we should be adding this as a first class type.  As
> > noted in
> > > > > the survey below it appears Complex numbers are not a core concept
> > in many
> > > > > general purpose coding languages and it doesn't appear to be a
> > common type
> > > > > in SQL systems either.
> > > > >
> > > > > The reason why I am being nit-picky here is I think that having a
> > first
> > > > > class type indicates that it should eventually be supported by all
> > > > > reference implementations.  An "well known" extension type I think
> > offers
> > > > > less guarantees which makes it seem more suitable for niche types.
> > > > >
> > > > > > I don't immediately see a Packed Struct type. Would this need to
> be
> > > > > > > implemented?
> > > > > > Not necessarily (*).  But before thinking about implementation,
> > this
> > > > > > proposal must be accepted into the format.
> > > > >
> > > > >
> > > > > Yes, this is a type that has been proposed in the past and I think
> > handles
> > > > > a lot of  types not yet in Arrow but have been requested (e.g. IP
> > > > > Addresses, Geo coordinates), etc.
> > > > >
> > > > > On Thu, Jun 10, 2021 at 1:06 AM Simon Perkins <
> > simon.perk...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > On Wed, Jun 9, 2021 at 7:56 PM Antoine Pitrou <
> anto...@python.org>
> > wrote:
> > > > > >
> > > > > > >
> > > > > > > Le 09/06/2021 à 17:52, Micah Kornfield a écrit :
> > > > > > > >
> > > > > > > > Adding a new first-class type in Arrow requires working
> > integration
> > > > > > tests
> > > > > > > > between C++ and Java libraries (once the idea is informally
> > agreed
> > > > > > upon)
> > > > > > > > and then a final vote for approval.  We haven't formalized
> > extension
> > > > > > > types
> > > > > > > > but I imagine a similar cross language requirement would be
> > agreed
> > > > > > upon.
> > > > > > > > Implementation of computation wouldn't be required for adding
> > a new
> > > > > > type.
> > > > > > > > Different language bindings have taken different approaches
> on
> > how much
> > > > > > > > additional computational elements are packaged in them.
> > > > > > >
> > > > > > > While dedicated types are not strictly required, compute
> > functions would
> > > > > > > be much easier to add for a first-class dedicated complex
> > datatype
> > > > > > > rather than for an extension type.
> > > > > > >
> > > > > > > Since complex numbers are quite common in some domains, and
> > since they
> > > > > > > are conceptually simply, IMHO it would make sense to add them
> to
> > the
> > > > > > > native Arrow datatypes (at least COMPLEX64 and COMPLEX128).
> > > > > > >
> > > > > >
> > > > > > I'm convinced now that  first-class types seem to be the way to
> go
> > and I'm
> > > > > > happy to take this approach.
> > > > > > Regarding compute functions, it looks like the standard set of
> > scalar
> > > > > > arithmetic and reduction functionality
> > > > > > is desirable for complex numbers:
> > > > > > https://arrow.apache.org/docs/cpp/compute.html#
> > > > > > Perhaps it would be better to split the addition of the Types and
> > addition
> > > > > > Compute functionality into separate PRs?
> > > > > >
> > > > > > Regarding the process for managing this PR, it sounds like a
> > proposal must
> > > > > > be voted on?
> > > > > > i.e. is this proposal still in this phase
> > > > > >
> >
> http://arrow.apache.org/docs/developers/contributing.html#before-starting
> > > > > > Regards
> > > > > >
> > > > > > Simon
> > > > > >
> >
>

Reply via email to