Hi all,

Thank you all for participating in the discussion. The feedback received
was very helpful!

I have updated the spec according to the discussion here and in the PR [1]
plus the talk we had with Rok and Joris. The change in the spec can be
found in the Description of the serialization section where dim_names and
permutations are now included as an *optional* metadata.

Please have a look at the PR [1] and give comments/suggest changes.
Once that is ready I will send the new version to the ML for a vote.

Rok has also created a google document titled Memory representations of
tensors in different languages [2] where he summarizes how other projects
and languages represent tensors/n-dim arrays. It gives a nice broader
picture of the topic.

[1] https://github.com/apache/arrow/pull/33925#
[2]
https://docs.google.com/document/d/1BG10KyDr62e0_WZqVaHcz90SnnLYmiVryZaayoKpmIA/edit?usp=sharing

All well,
Alenka

On Tue, Feb 14, 2023 at 1:00 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> On Tue, 7 Feb 2023 at 19:32, Quentin Lhoest <quen...@huggingface.co>
> wrote:
> >
> > Hi,
> >
> > If I remember correctly one can already pass `types_mapper`
> > to `pa.Table.to_pandas`, to allow Ray or HF Datasets to define
> > their own pandas extension types associated to the arrow
> > extension types. I guess this could also be used until there is a
> decision
> > to include those types in Arrow or not ?
> >
>
> Yes, that's correct (although we should verify this also works to
> override this for extension types, i.e. that types_mappers gets the
> priority in deciding the resulting pandas extension dtype).
> For packages like Ray or HF Datasets, that might be a good enough
> solution; for end-users this is less convenient because you need to
> specify this any time you do a conversion from arrow to pandas, while
> with `to_pandas_dtype` mechanism this gets used by default.
>
> Joris
>

Reply via email to