On Thu, Jul 2, 2020 at 3:32 AM Maarten Breddels <maartenbredd...@gmail.com> wrote: > > Hi, > > in the process of adding Arrow support in Vaex (natively, not converting to > Numpy as we did before), one of our biggest pain points is (surprisingly) > the name mismatch between NumPy's .tolist() and Arrow's .to_pylist(). > Especially in code that deals with both types of arrays, this is a bit of > an annoyance. We actually use tolist() a lot in our unittests as well. I > wonder if this was done with a purposely, or if this is something that > could still be changed/added.
This particular function could be renamed or aliased, but in general substitutability in code that currently uses NumPy has not been a goal of the project. > The difference in filter/take vs fancy indexing with [] is ok, it doesn't > happen that often, but I was wondering if this will be added later, or if > this stays as it is. I personally wouldn't be thrilled about this -- I think adding too many syntactic conveniences or trying to emulate NumPy would be a slippery slope ("you emulate this, but why not that?"). > Another difficult thing is testing for string arrays, since there are two > string types (utf8 and large_utf8) testing if something is of string type > is a bit annoying. I don't plan to have a type system in Vaex itself, so we > leak this to users. > A similar issue is also array testing, testing if something is an arrow > array (chunked or plain) is again a test against two types (e.g. > isinstance(ar, (pa.Array, pa.ChunkedArray)). > I could see some helper functions pa.is_array and pa.is_string (this is > already taken, and I guess only tests for 32bit offset strings arrays) Having some more helper type checking functions sounds fine. > Overall, we're quite positive, and as you see, the pain points are not > fundamental issue, but annoyances that might be easy to fix, and make > adoption smoother/faster. > > cheers, > > Maarten Breddels > Software engineer / consultant / data scientist > Python / C++ / Javascript / Jupyter > www.maartenbreddels.com / vaex.io > maartenbredd...@gmail.com +31 6 2464 0838 <+31+6+24640838> > [image: Twitter] <https://twitter.com/maartenbreddels>[image: Github] > <https://github.com/maartenbreddels>[image: LinkedIn] > <https://linkedin.com/in/maartenbreddels>[image: Skype]