+1 (non-binding) - especially the rabbit canonical extension!

On Thu, Apr 3, 2025 at 06:59 Benjamin Kietzman <bengil...@gmail.com> wrote:

> +1
>
> On Tue, Apr 1, 2025, 22:07 Gang Wu <ust...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > I'll propose a Rabbit canonical extension type next year.
> >
> > Best,
> > Gang
> >
> >
> > On Wed, Apr 2, 2025 at 10:49 AM wish maple <maplewish...@gmail.com>
> wrote:
> >
> > > Out of curiosity, so this turtle type is like an array
> > > containing the info arrow stream ipc batches?
> > >
> > > Do binary values have some alignas rule? And
> > > is `label` and `value` all non-nullable?
> > >
> > > Best,
> > > Xuwei Fu
> > >
> > > Weston Pace <weston.p...@gmail.com> 于2025年4月2日周三 02:52写道:
> > >
> > > > I've written a draft at [1] but for simplicity's sake I will include
> > the
> > > > text of the proposal inline below.
> > > >
> > > > [1]
> > https://github.com/westonpace/arrow/tree/feat/turtle-extension-type
> > > >
> > > > TURTLE
> > > > ======
> > > >
> > > > * Extension name: ``arrow.turtle``.
> > > >
> > > > * The storage type of the extension is ``Struct`` where the struct
> > array
> > > is
> > > >   composed of the following fields:
> > > >
> > > >   * **label: String** = A label for this particular batch.
> > > >   * **value: Binary** = A record batch serialized using the Arrow IPC
> > > > streaming
> > > >   format.  The bytes should contain valid Arrow IPC bytes which can
> be
> > > > deserialized
> > > >   as if it were an independent buffer or file.  The batch should
> > conform
> > > to
> > > > the
> > > >   schema encoded in the ``schema`` parameter.
> > > >
> > > > * Extension type parameters:
> > > >
> > > >   * **schema** = the schema of the record batches, serialized using
> the
> > > IPC
> > > >   streaming format and encoded into JSON with base64.  All records in
> > the
> > > >   array must conform to this schema.
> > > >
> > > > * Description of the serialization:
> > > >
> > > >   The metadata must be a valid JSON object with the ``schema`` field.
> > > The
> > > >   schema field should be a base64 encoded JSON string as described
> > above.
> > > >
> > > > Rationale
> > > > ---------
> > > >
> > > > Tabular data is a common approach for recording measurements and
> > > > observations.
> > > > The columns represent different measurements and the rows represent
> > > > "events"
> > > > or "samples" that have been taken.  For example, a weather station
> may
> > > > record
> > > > the temperature, pressure, and wind speed every hour.
> > > >
> > > > With the introduction of quantum computing, we now must consider the
> > case
> > > > where
> > > > each event is a superposition of multiple states and we need to
> record
> > > all
> > > > possible states.  As a simplification we can think of each element in
> > the
> > > > array as a measurement made in a separate but parallel universe.
> > > >
> > > > The ``Label`` field can be used to give a human-readable label to the
> > > > various
> > > > universes or states being measured.  Alternatively, if there is no
> > > > meaningful
> > > > label, it can be an empty string.
> > > >
> > > > Following this approach we arrive at a three dimensional tabular
> > > > structure.  However,
> > > > there is no reason that we must stop at three dimensions.  The batch
> > can
> > > > contain
> > > > additional turtle fields to encode an arbitrary number of additional
> > > > dimensions.
> > > >
> > > > Etymology
> > > > ---------
> > > >
> > > > The name ``Turtle`` comes from the scientific discovery of the world
> > > turtle
> > > > upon
> > > > which our universe rests.  It is a well known fact that the world
> > turtle
> > > > itself
> > > > rests upon the back of another turtle, which is supported by a series
> > of
> > > > ever larger
> > > > turtles.  This real life recursive structure seemed like a good fit
> for
> > > > representing
> > > > the recursive nature of this extension type.
> > > >
> > >
> >
>

Reply via email to