Hi Alenka,

Le 21/02/2023 à 13:38, Alenka Frim a écrit :

Fixed shape tensor
==================

* Extension name: `arrow.fixed_shape_tensor`.

* The storage type of the extension: ``FixedSizeList`` where:

   * **value_type** is the data type of individual tensors and
     is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``.

I would say "the data type of individual tensor elements".
(so that people don't try to make it e.g. List(float64)).

Also, I don't think any reference to pyarrow should be made here.

   * **list_size** is the product of all the elements in tensor shape.

* Extension type parameters:

   * **value_type** = Arrow DataType of the tensor elements
   * **shape** = shape of the contained tensors as an array

I would say the "the physical shape" to make it clear it refers to how values are laid out in memory, while `dim_names` and `permutation` drive the logical interpretation.

   Optional parameters:

   * **dim_names** = explicit names to tensor dimensions
     as an array. The length of it should be equal to the shape
     length and equal to the number of dimensions.

     ``dim_names`` can be used if the dimensions have well-known
     names and they map to the physical layout (row-major).

   * **permutation**  = indices of the desired ordering of the
     original dimensions, defined as an array.

     The indices contain a permutation of the values [0, 1, .., N-1] where
     N is the number of dimensions. The permutation indicates which
     dimension of the logical layout corresponds to which dimension of the
     physical tensor (the i-th dimension of the logical view corresponds
     to the dimension with number ``permutations[i]`` of the physical tensor).

     Permutation can be useful in case the logical order of
     the tensor is a permutation of the physical order (row-major).

     When logical and physical layout are equal, the permutation will always
     be ([0, 1, .., N-1]) and can therefore be left out.

Should we rule that `dim_names` and `permutation` are mutually exclusive?

* Description of the serialization:

   The metadata must be a valid JSON object including shape of
   the contained tensors as an array with key **"shape"** plus optional
   dimension names with keys **"dim_names"** and ordering of the
   dimensions with key **"permutation"**.

   - Example: ``{ "shape": [2, 5]}``
   - Example with ``dim_names`` metadata for NCHW ordered data:

     ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``

   - Example of permuted 3-dimensional tensor:

     ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``

Perhaps explain in this example that the logical shape is [500, 100, 200]?
(if I understand `permutation` correctly)

Regards

Antoine.

Reply via email to