On Thu, Jul 2, 2020 at 3:32 AM Maarten Breddels
<maartenbredd...@gmail.com> wrote:
>
> Hi,
>
> in the process of adding Arrow support in Vaex (natively, not converting to
> Numpy as we did before), one of our biggest pain points is (surprisingly)
> the name mismatch between NumPy's .tolist() and Arrow's .to_pylist().
> Especially in code that deals with both types of arrays, this is a bit of
> an annoyance. We actually use tolist() a lot in our unittests as well. I
> wonder if this was done with a purposely, or if this is something that
> could still be changed/added.

This particular function could be renamed or aliased, but in general
substitutability in code that currently uses NumPy has not been a goal
of the project.

> The difference in filter/take vs fancy indexing with [] is ok, it doesn't
> happen that often, but I was wondering if this will be added later, or if
> this stays as it is.

I personally wouldn't be thrilled about this -- I think adding too
many syntactic conveniences or trying to emulate NumPy would be a
slippery slope ("you emulate this, but why not that?").

> Another difficult thing is testing for string arrays, since there are two
> string types (utf8 and large_utf8) testing if something is of string type
> is a bit annoying. I don't plan to have a type system in Vaex itself, so we
> leak this to users.
> A similar issue is also array testing, testing if something is an arrow
> array (chunked or plain) is again a test against two types (e.g.
> isinstance(ar, (pa.Array, pa.ChunkedArray)).
> I could see some helper functions pa.is_array and pa.is_string (this is
> already taken, and I guess only tests for 32bit offset strings arrays)

Having some more helper type checking functions sounds fine.

> Overall, we're quite positive, and as you see, the pain points are not
> fundamental issue, but annoyances that might be easy to fix, and make
> adoption smoother/faster.
>
> cheers,
>
> Maarten Breddels
> Software engineer / consultant / data scientist
> Python / C++ / Javascript / Jupyter
> www.maartenbreddels.com / vaex.io
> maartenbredd...@gmail.com +31 6 2464 0838 <+31+6+24640838>
> [image: Twitter] <https://twitter.com/maartenbreddels>[image: Github]
> <https://github.com/maartenbreddels>[image: LinkedIn]
> <https://linkedin.com/in/maartenbreddels>[image: Skype]

Reply via email to