Hi all,
I would like to propose we vote on adding the fixed shape tensor canonical
extension type
with the following specification:
Fixed shape tensor
==================
* Extension name: `arrow.fixed_shape_tensor`.
* The storage type of the extension: ``FixedSizeList`` where:
* **value_type** is the data type of individual tensors and
is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``.
* **list_size** is the product of all the elements in tensor shape.
* Extension type parameters:
* **value_type** = Arrow DataType of the tensor elements
* **shape** = shape of the contained tensors as an array
Optional parameters:
* **dim_names** = explicit names to tensor dimensions
as an array. The length of it should be equal to the shape
length and equal to the number of dimensions.
``dim_names`` can be used if the dimensions have well-known
names and they map to the physical layout (row-major).
* **permutation** = indices of the desired ordering of the
original dimensions, defined as an array.
The indices contain a permutation of the values [0, 1, .., N-1] where
N is the number of dimensions. The permutation indicates which
dimension of the logical layout corresponds to which dimension of the
physical tensor (the i-th dimension of the logical view corresponds
to the dimension with number ``permutations[i]`` of the physical tensor).
Permutation can be useful in case the logical order of
the tensor is a permutation of the physical order (row-major).
When logical and physical layout are equal, the permutation will always
be ([0, 1, .., N-1]) and can therefore be left out.
* Description of the serialization:
The metadata must be a valid JSON object including shape of
the contained tensors as an array with key **"shape"** plus optional
dimension names with keys **"dim_names"** and ordering of the
dimensions with key **"permutation"**.
- Example: ``{ "shape": [2, 5]}``
- Example with ``dim_names`` metadata for NCHW ordered data:
``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``
- Example of permuted 3-dimensional tensor:
``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``
.. note::
Elements in a fixed shape tensor extension array are stored
in row-major/C-contiguous order.
* The specification is submitted as a PR [1] to Canonical Extension Types
document under the
format specifications directory [2].
There are also two implementations submitted to Apache Arrow repository:
* C++ implementation of the proposed specification [3]
* Python example implementation of the proposed specification and usage
(only illustrative) [4]
The vote will be open for at least 72 hours.
[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...
Regards, Alenka
[1]: https://github.com/apache/arrow/pull/33925/files
[2]:
https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst
[3]: https://github.com/apache/arrow/pull/8510/files
[4]: https://github.com/apache/arrow/pull/33948/files