I concur. +1 On Tue, Apr 1, 2025 at 8:52 PM Weston Pace <weston.p...@gmail.com> wrote:
> I've written a draft at [1] but for simplicity's sake I will include the > text of the proposal inline below. > > [1] https://github.com/westonpace/arrow/tree/feat/turtle-extension-type > > TURTLE > ====== > > * Extension name: ``arrow.turtle``. > > * The storage type of the extension is ``Struct`` where the struct array is > composed of the following fields: > > * **label: String** = A label for this particular batch. > * **value: Binary** = A record batch serialized using the Arrow IPC > streaming > format. The bytes should contain valid Arrow IPC bytes which can be > deserialized > as if it were an independent buffer or file. The batch should conform to > the > schema encoded in the ``schema`` parameter. > > * Extension type parameters: > > * **schema** = the schema of the record batches, serialized using the IPC > streaming format and encoded into JSON with base64. All records in the > array must conform to this schema. > > * Description of the serialization: > > The metadata must be a valid JSON object with the ``schema`` field. The > schema field should be a base64 encoded JSON string as described above. > > Rationale > --------- > > Tabular data is a common approach for recording measurements and > observations. > The columns represent different measurements and the rows represent > "events" > or "samples" that have been taken. For example, a weather station may > record > the temperature, pressure, and wind speed every hour. > > With the introduction of quantum computing, we now must consider the case > where > each event is a superposition of multiple states and we need to record all > possible states. As a simplification we can think of each element in the > array as a measurement made in a separate but parallel universe. > > The ``Label`` field can be used to give a human-readable label to the > various > universes or states being measured. Alternatively, if there is no > meaningful > label, it can be an empty string. > > Following this approach we arrive at a three dimensional tabular > structure. However, > there is no reason that we must stop at three dimensions. The batch can > contain > additional turtle fields to encode an arbitrary number of additional > dimensions. > > Etymology > --------- > > The name ``Turtle`` comes from the scientific discovery of the world turtle > upon > which our universe rests. It is a well known fact that the world turtle > itself > rests upon the back of another turtle, which is supported by a series of > ever larger > turtles. This real life recursive structure seemed like a good fit for > representing > the recursive nature of this extension type. >