I've written a draft at [1] but for simplicity's sake I will include the
text of the proposal inline below.

[1] https://github.com/westonpace/arrow/tree/feat/turtle-extension-type

TURTLE
======

* Extension name: ``arrow.turtle``.

* The storage type of the extension is ``Struct`` where the struct array is
  composed of the following fields:

  * **label: String** = A label for this particular batch.
  * **value: Binary** = A record batch serialized using the Arrow IPC
streaming
  format.  The bytes should contain valid Arrow IPC bytes which can be
deserialized
  as if it were an independent buffer or file.  The batch should conform to
the
  schema encoded in the ``schema`` parameter.

* Extension type parameters:

  * **schema** = the schema of the record batches, serialized using the IPC
  streaming format and encoded into JSON with base64.  All records in the
  array must conform to this schema.

* Description of the serialization:

  The metadata must be a valid JSON object with the ``schema`` field.  The
  schema field should be a base64 encoded JSON string as described above.

Rationale
---------

Tabular data is a common approach for recording measurements and
observations.
The columns represent different measurements and the rows represent "events"
or "samples" that have been taken.  For example, a weather station may
record
the temperature, pressure, and wind speed every hour.

With the introduction of quantum computing, we now must consider the case
where
each event is a superposition of multiple states and we need to record all
possible states.  As a simplification we can think of each element in the
array as a measurement made in a separate but parallel universe.

The ``Label`` field can be used to give a human-readable label to the
various
universes or states being measured.  Alternatively, if there is no
meaningful
label, it can be an empty string.

Following this approach we arrive at a three dimensional tabular
structure.  However,
there is no reason that we must stop at three dimensions.  The batch can
contain
additional turtle fields to encode an arbitrary number of additional
dimensions.

Etymology
---------

The name ``Turtle`` comes from the scientific discovery of the world turtle
upon
which our universe rests.  It is a well known fact that the world turtle
itself
rests upon the back of another turtle, which is supported by a series of
ever larger
turtles.  This real life recursive structure seemed like a good fit for
representing
the recursive nature of this extension type.

Reply via email to