Arrow Datasets Functionality for Python

Matthew Turner Sun, 09 Feb 2020 20:24:57 -0800

Hi Wes / Arrow Dev Team,

Following up on our brief twitter 
convo<https://twitter.com/wesmckinn/status/1222647039252525057> on the Datasets 
functionality in R / Python.


To provide context to others, you had mentioned that the API in python / 
pyarrow was more developer centric and intended for users to consume it through 
higher level interfaces(i.e. IBIS).  This was in comparison to dplyr which from 
your demo had some nice analytic capabilities on top of Arrow Datasets.

Seeing that demonstration made me interested to see similar Arrow Datasets 
functionality within Python.  But it doesn't seem that is an intended 
capability for pyarrow which I do generally understand.  However, I was trying 
to understand how Gandiva ties into the Arrow project as I understand that to 
be an analytic engine of sorts (maybe im misunderstanding).  I saw 
this<http://blog.christianperone.com/tag/python/> implementation of Gandiva 
with pandas which was quite interesting and was wondering if this is the 
strategic goal - to have Gandiva be a component of third party tools who use 
arrow or if Gandiva would eventually be more of a core analytic component of 
Arrow.

Extending on this I hoping to get the teams view on what they see as the likely 
home of dplyr datasets type functionality within the python ecosystem (i.e. 
IBIS or something else).

Thanks to all for your work on the project!

Best,

Matthew M. Turner
Email: [email protected]<mailto:[email protected]>
Phone: (908)-868-2786

Arrow Datasets Functionality for Python

Reply via email to