Hi, in the process of adding Arrow support in Vaex (natively, not converting to Numpy as we did before), one of our biggest pain points is (surprisingly) the name mismatch between NumPy's .tolist() and Arrow's .to_pylist(). Especially in code that deals with both types of arrays, this is a bit of an annoyance. We actually use tolist() a lot in our unittests as well. I wonder if this was done with a purposely, or if this is something that could still be changed/added.
The difference in filter/take vs fancy indexing with [] is ok, it doesn't happen that often, but I was wondering if this will be added later, or if this stays as it is. Another difficult thing is testing for string arrays, since there are two string types (utf8 and large_utf8) testing if something is of string type is a bit annoying. I don't plan to have a type system in Vaex itself, so we leak this to users. A similar issue is also array testing, testing if something is an arrow array (chunked or plain) is again a test against two types (e.g. isinstance(ar, (pa.Array, pa.ChunkedArray)). I could see some helper functions pa.is_array and pa.is_string (this is already taken, and I guess only tests for 32bit offset strings arrays) Overall, we're quite positive, and as you see, the pain points are not fundamental issue, but annoyances that might be easy to fix, and make adoption smoother/faster. cheers, Maarten Breddels Software engineer / consultant / data scientist Python / C++ / Javascript / Jupyter www.maartenbreddels.com / vaex.io maartenbredd...@gmail.com +31 6 2464 0838 <+31+6+24640838> [image: Twitter] <https://twitter.com/maartenbreddels>[image: Github] <https://github.com/maartenbreddels>[image: LinkedIn] <https://linkedin.com/in/maartenbreddels>[image: Skype]