Hi all, Thanks Rok for your view on the Parquet topic.
Thank you all for joining in on the discussion and the designing of the spec for the support of fixed shape tensors in Apache Arrow! With 3 binding +1 votes, 2 non-binding +1 votes, and no -1 vote, the vote has passed. The PR with the specification is ready [1] so I will merge it later today. The next step is the C++ implementation. The PR [2] is already in the final stages of the review process. [1]: https://github.com/apache/arrow/pull/33925 [2]: https://github.com/apache/arrow/pull/8510/files All well, Alenka On Mon, Mar 6, 2023 at 1:41 PM Alenka Frim <ale...@voltrondata.com> wrote: > Hi all, > > I am starting a new voting thread with this email as the first voting > thread [1] opened up new > comments and suggestions and we wanted to take time to see how > that evolves. > > *I would like to propose we vote on adding the fixed shape tensor > canonical extension type* > *with the following specification:* > > Fixed shape tensor > ================== > > * Extension name: `arrow.fixed_shape_tensor`. > > * The storage type of the extension: ``FixedSizeList`` where: > > * **value_type** is the data type of individual tensor elements. > * **list_size** is the product of all the elements in tensor shape. > > * Extension type parameters: > > * **value_type** = the Arrow data type of individual tensor elements. > * **shape** = the physical shape of the contained tensors > as an array. > > Optional parameters describing the logical layout: > > * **dim_names** = explicit names to tensor dimensions > as an array. The length of it should be equal to the shape > length and equal to the number of dimensions. > > ``dim_names`` can be used if the dimensions have well-known > names and they map to the physical layout (row-major). > > * **permutation** = indices of the desired ordering of the > original dimensions, defined as an array. > > The indices contain a permutation of the values [0, 1, .., N-1] where > N is the number of dimensions. The permutation indicates which > dimension of the logical layout corresponds to which dimension of the > physical tensor (the i-th dimension of the logical view corresponds > to the dimension with number ``permutations[i]`` of the physical tensor). > > Permutation can be useful in case the logical order of > the tensor is a permutation of the physical order (row-major). > > When logical and physical layout are equal, the permutation will always > be ([0, 1, .., N-1]) and can therefore be left out. > > * Description of the serialization: > > The metadata must be a valid JSON object including shape of > the contained tensors as an array with key **"shape"** plus optional > dimension names with keys **"dim_names"** and ordering of the > dimensions with key **"permutation"**. > > - Example: ``{ "shape": [2, 5]}`` > - Example with ``dim_names`` metadata for NCHW ordered data: > > ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}`` > > - Example of permuted 3-dimensional tensor: > > ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}`` > > This is the physical layout shape and the the shape of the logical > layout would in this case be ``[500, 100, 200]``. > > .. note:: > > Elements in a fixed shape tensor extension array are stored > in row-major/C-contiguous order. > > * The specification is submitted as a PR [2] to Canonical Extension Types > document under the > format specifications directory [3]. > > There are also two implementations submitted to Apache Arrow repository: > * C++ implementation of the proposed specification [4] > * Python example implementation of the proposed specification and usage > (only illustrative) [5] > > > The vote will be open for at least 72 hours. > > [ ] +1 Accept this proposal > [ ] +0 > [ ] -1 Do not accept this proposal because... > > > Regards, Alenka > > [1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd > [2]: https://github.com/apache/arrow/pull/33925/files > [3]: > https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst > > [4]: https://github.com/apache/arrow/pull/8510/files > [5]: https://github.com/apache/arrow/pull/33948/files >