RE: Arrow Datasets Functionality for Python

2020-02-18 Thread Matthew Turner
Noted, thanks. Will be in touch. Matthew M. Turner Email: matthew.m.tur...@outlook.com Phone: (908)-868-2786 -Original Message- From: Wes McKinney Sent: Tuesday, February 18, 2020 3:30 AM To: dev Subject: Re: Arrow Datasets Functionality for Python hi Matthew, Thanks -- our

Re: Arrow Datasets Functionality for Python

2020-02-18 Thread Wes McKinney
atthew.m.tur...@outlook.com > Phone: (908)-868-2786 > > -Original Message- > From: Wes McKinney > Sent: Monday, February 10, 2020 10:33 AM > To: dev > Subject: Re: Arrow Datasets Functionality for Python > > I will add that I'm interested in being involved wi

RE: Arrow Datasets Functionality for Python

2020-02-17 Thread Matthew Turner
s all for the work and I look forward to all the developments this year. Best, Matthew M. Turner Email: matthew.m.tur...@outlook.com Phone: (908)-868-2786 -Original Message- From: Wes McKinney Sent: Monday, February 10, 2020 10:33 AM To: dev Subject: Re: Arrow Datasets Functionalit

Re: Arrow Datasets Functionality for Python

2020-02-10 Thread Wes McKinney
I will add that I'm interested in being involved with developing high level Python interfaces to all of this functionality (e.g. using Ibis [1]). It would be worth prototyping at least a datasets interface layer for efficient data selection (predicate pushdown + filtering) and then expanding to sup

Re: Arrow Datasets Functionality for Python

2020-02-10 Thread Francois Saint-Jacques
Hello Matthew, The dplyr binding is just syntactic sugar on top of the dataset API. There's no analytics capabilities yet [1], other than the select and the limited projection supported by the dataset API. It looks like it is doing analytics due to properly placed `collect()` calls, which converts

Arrow Datasets Functionality for Python

2020-02-09 Thread Matthew Turner
Hi Wes / Arrow Dev Team, Following up on our brief twitter convo on the Datasets functionality in R / Python. To provide context to others, you had mentioned that the API in python / pyarrow was more developer centric and intended for u