Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-13 Thread Noman Khan
+1(non-binding) Regards Noman From: Xiao Li Sent: Tuesday, September 12, 2017 2:44:26 AM To: Matei Zaharia; Hyukjin Kwon Cc: spark-dev Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python +1 Xiao On Mon, 11 Sep 2017 at 6:44 PM Matei Zaharia

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-12 Thread Takuya UESHIN
t; specify the size hint. This can be done in the PR review though. > >> > > >> > On Sat, Sep 2, 2017 at 2:07 AM, Felix Cheung < > > > felixcheung_m@ > > > > > >> wrote: > >> > +1 on this and like the suggestion of type in

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Liang-Chi Hsieh
Cheung < > felixcheung_m@ > > >> wrote: >> > +1 on this and like the suggestion of type in string form. >> > >> > Would it be correct to assume there will be data type check, for >> example >> the returned pandas data frame column data types

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Liang-Chi Hsieh
Cheung < > felixcheung_m@ > > >> wrote: >> > +1 on this and like the suggestion of type in string form. >> > >> > Would it be correct to assume there will be data type check, for >> example >> the returned pandas data frame column data types

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Xiao Li
ype check, for example > the returned pandas data frame column data types match what are specified. > We have seen quite a bit of issues/confusions with that in R. > > > > Would it make sense to have a more generic decorator name so that it > could also be useable for other eff

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Matei Zaharia
gt; also be useable for other efficient vectorized format in the future? Or do we > anticipate the decorator to be format specific and will have more in the > future? > > From: Reynold Xin > Sent: Friday, September 1, 2017 5:16:11 AM > To: Takuya UESHIN > Cc: spark-dev &

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Hyukjin Kwon
e column data types match what are >>>>>> specified. We have seen quite a bit of issues/confusions with that in R. >>>>>> >>>>>> Would it make sense to have a more generic decorator name so that it >>>>>> could also be useable for other eff

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Yin Huai
xample the returned pandas data frame column data types match what are >>>>> specified. We have seen quite a bit of issues/confusions with that in R. >>>>> >>>>> Would it make sense to have a more generic decorator name so that it >>>>> could also be useable for other efficient vectorized fo

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Sameer Agarwal
also be useable for other efficient vectorized format in the future? >>>> Or do we anticipate the decorator to be format specific and will have more >>>> in the future? >>>> >>>> -- >>>> *From:* Reynold Xin >

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-07 Thread Bryan Cutler
e generic decorator name so that it >>> could also be useable for other efficient vectorized format in the future? >>> Or do we anticipate the decorator to be format specific and will have more >>> in the future? >>> >>> -- >>

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-06 Thread Takuya UESHIN
o be useable for other efficient vectorized format in the future? >> Or do we anticipate the decorator to be format specific and will have more >> in the future? >> >> -- >> *From:* Reynold Xin >> *Sent:* Friday, September 1, 2017 5:1

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-05 Thread Wenchen Fan
6:11 AM > *To:* Takuya UESHIN > *Cc:* spark-dev > *Subject:* Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python > > Ok, thanks. > > +1 on the SPIP for scope etc > > > On API details (will deal with in code reviews as well but leaving a note > here in case I forget) >

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Felix Cheung
:11 AM To: Takuya UESHIN Cc: spark-dev Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python Ok, thanks. +1 on the SPIP for scope etc On API details (will deal with in code reviews as well but leaving a note here in case I forget) 1. I would suggest having the API also accept data

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Reynold Xin
Ok, thanks. +1 on the SPIP for scope etc On API details (will deal with in code reviews as well but leaving a note here in case I forget) 1. I would suggest having the API also accept data type specification in string form. It is usually simpler to say "long" then "LongType()". 2. Think about

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Takuya UESHIN
Yes, the aggregation is out of scope for now. I think we should continue discussing the aggregation at JIRA and we will be adding those later separately. Thanks. On Fri, Sep 1, 2017 at 6:52 PM, Reynold Xin wrote: > Is the idea aggregate is out of scope for the current effort and we will > be a

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Reynold Xin
Is the idea aggregate is out of scope for the current effort and we will be adding those later? On Fri, Sep 1, 2017 at 8:01 AM Takuya UESHIN wrote: > Hi all, > > We've been discussing to support vectorized UDFs in Python and we almost > got a consensus about the APIs, so I'd like to summarize an

[VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-08-31 Thread Takuya UESHIN
Hi all, We've been discussing to support vectorized UDFs in Python and we almost got a consensus about the APIs, so I'd like to summarize and call for a vote. Note that this vote should focus on APIs for vectorized UDFs, not APIs for vectorized UDAFs or Window operations. https://issues.apache.o