This vote passes with 4 binding +1 votes, 6 non-binding votes, no +0 vote, and no -1 votes.
Thanks all! +1 votes (binding): Reynold Xin Wenchen Fan Yin Huai Matei Zaharia +1 votes (non-binding): Felix Cheung Bryan Cutler Sameer Agarwal Hyukjin Kwon Xiao Li Liang-Chi Hsieh On Tue, Sep 12, 2017 at 11:46 AM, Liang-Chi Hsieh <vii...@gmail.com> wrote: > +1 > > > Xiao Li wrote > > +1 > > > > Xiao > > On Mon, 11 Sep 2017 at 6:44 PM Matei Zaharia < > > > matei.zaharia@ > > > > > > wrote: > > > >> +1 (binding) > >> > >> > On Sep 11, 2017, at 5:54 PM, Hyukjin Kwon < > > > gurwls223@ > > > > wrote: > >> > > >> > +1 (non-binding) > >> > > >> > > >> > 2017-09-12 9:52 GMT+09:00 Yin Huai < > > > yhuai@ > > > >: > >> > +1 > >> > > >> > On Mon, Sep 11, 2017 at 5:47 PM, Sameer Agarwal < > > > sameer@ > > > > > >> wrote: > >> > +1 (non-binding) > >> > > >> > On Thu, Sep 7, 2017 at 9:10 PM, Bryan Cutler < > > > cutlerb@ > > > > wrote: > >> > +1 (non-binding) for the goals and non-goals of this SPIP. I think > >> it's > >> fine to work out the minor details of the API during review. > >> > > >> > Bryan > >> > > >> > On Wed, Sep 6, 2017 at 5:17 AM, Takuya UESHIN < > > > ueshin@ > > > > > >> wrote: > >> > Hi all, > >> > > >> > Thank you for voting and suggestions. > >> > > >> > As Wenchen mentioned and also we're discussing at JIRA, we need to > >> discuss the size hint for the 0-parameter UDF. > >> > But I believe we got a consensus about the basic APIs except for the > >> size hint, I'd like to submit a pr based on the current proposal and > >> continue discussing in its review. > >> > > >> > https://github.com/apache/spark/pull/19147 > >> > > >> > I'd keep this vote open to wait for more opinions. > >> > > >> > Thanks. > >> > > >> > > >> > On Wed, Sep 6, 2017 at 9:48 AM, Wenchen Fan < > > > cloud0fan@ > > > > wrote: > >> > +1 on the design and proposed API. > >> > > >> > One detail I'd like to discuss is the 0-parameter UDF, how we can > >> specify the size hint. This can be done in the PR review though. > >> > > >> > On Sat, Sep 2, 2017 at 2:07 AM, Felix Cheung < > > > felixcheung_m@ > > > > > >> wrote: > >> > +1 on this and like the suggestion of type in string form. > >> > > >> > Would it be correct to assume there will be data type check, for > >> example > >> the returned pandas data frame column data types match what are > >> specified. > >> We have seen quite a bit of issues/confusions with that in R. > >> > > >> > Would it make sense to have a more generic decorator name so that it > >> could also be useable for other efficient vectorized format in the > >> future? > >> Or do we anticipate the decorator to be format specific and will have > >> more > >> in the future? > >> > > >> > From: Reynold Xin < > > > rxin@ > > > > > >> > Sent: Friday, September 1, 2017 5:16:11 AM > >> > To: Takuya UESHIN > >> > Cc: spark-dev > >> > Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python > >> > > >> > Ok, thanks. > >> > > >> > +1 on the SPIP for scope etc > >> > > >> > > >> > On API details (will deal with in code reviews as well but leaving a > >> note here in case I forget) > >> > > >> > 1. I would suggest having the API also accept data type specification > >> in > >> string form. It is usually simpler to say "long" then "LongType()". > >> > > >> > 2. Think about what error message to show when the rows numbers don't > >> match at runtime. > >> > > >> > > >> > On Fri, Sep 1, 2017 at 12:29 PM Takuya UESHIN < > > > ueshin@ > > > > > >> wrote: > >> > Yes, the aggregation is out of scope for now. > >> > I think we should continue discussing the aggregation at JIRA and we > >> will be adding those later separately. > >> > > >> > Thanks. > >> > > >> > > >> > On Fri, Sep 1, 2017 at 6:52 PM, Reynold Xin < > > > rxin@ > > > > wrote: > >> > Is the idea aggregate is out of scope for the current effort and we > >> will > >> be adding those later? > >> > > >> > On Fri, Sep 1, 2017 at 8:01 AM Takuya UESHIN < > > > ueshin@ > > > > > >> wrote: > >> > Hi all, > >> > > >> > We've been discussing to support vectorized UDFs in Python and we > >> almost > >> got a consensus about the APIs, so I'd like to summarize and call for a > >> vote. > >> > > >> > Note that this vote should focus on APIs for vectorized UDFs, not APIs > >> for vectorized UDAFs or Window operations. > >> > > >> > https://issues.apache.org/jira/browse/SPARK-21190 > >> > > >> > > >> > Proposed API > >> > > >> > We introduce a @pandas_udf decorator (or annotation) to define > >> vectorized UDFs which takes one or more pandas.Series or one integer > >> value > >> meaning the length of the input value for 0-parameter UDFs. The return > >> value should be pandas.Series of the specified type and the length of > the > >> returned value should be the same as input value. > >> > > >> > We can define vectorized UDFs as: > >> > > >> > @pandas_udf(DoubleType()) > >> > def plus(v1, v2): > >> > return v1 + v2 > >> > > >> > or we can define as: > >> > > >> > plus = pandas_udf(lambda v1, v2: v1 + v2, DoubleType()) > >> > > >> > We can use it similar to row-by-row UDFs: > >> > > >> > df.withColumn('sum', plus(df.v1, df.v2)) > >> > > >> > As for 0-parameter UDFs, we can define and use as: > >> > > >> > @pandas_udf(LongType()) > >> > def f0(size): > >> > return pd.Series(1).repeat(size) > >> > > >> > df.select(f0()) > >> > > >> > > >> > > >> > The vote will be up for the next 72 hours. Please reply with your > vote: > >> > > >> > +1: Yeah, let's go forward and implement the SPIP. > >> > +0: Don't really care. > >> > -1: I don't think this is a good idea because of the following > >> technical > >> reasons. > >> > > >> > Thanks! > >> > > >> > -- > >> > Takuya UESHIN > >> > Tokyo, Japan > >> > > >> > http://twitter.com/ueshin > >> > > >> > > >> > > >> > -- > >> > Takuya UESHIN > >> > Tokyo, Japan > >> > > >> > http://twitter.com/ueshin > >> > > >> > > >> > > >> > > >> > -- > >> > Takuya UESHIN > >> > Tokyo, Japan > >> > > >> > http://twitter.com/ueshin > >> > > >> > > >> > > >> > > >> > -- > >> > Sameer Agarwal > >> > Software Engineer | Databricks Inc. > >> > http://cs.berkeley.edu/~sameerag > >> > > >> > > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: > > > dev-unsubscribe@.apache > > >> > >> > > > > > > ----- > Liang-Chi Hsieh | @viirya > Spark Technology Center > http://www.spark.tc/ > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin