Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Liang-Chi Hsieh
Yeah, in short this is a great compromise approach and I do like to see this proposal move forward to next step. This discussion is valuable. Chao Sun wrote > +1 on Dongjoon's proposal. Great to see this is getting moved forward and > thanks everyone for the insightful discussion! > > > > On

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Chao Sun
+1 on Dongjoon's proposal. Great to see this is getting moved forward and thanks everyone for the insightful discussion! On Thu, Mar 4, 2021 at 8:58 AM Ryan Blue wrote: > Okay, great. I'll update the SPIP doc and call a vote in the next day or > two. > > On Thu, Mar 4, 2021 at 8:26 AM Erik Kro

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Ryan Blue
Okay, great. I'll update the SPIP doc and call a vote in the next day or two. On Thu, Mar 4, 2021 at 8:26 AM Erik Krogen wrote: > +1 on Dongjoon's proposal. This is a very nice compromise between the > reflective/magic-method approach and the InternalRow approach, providing > a lot of flexibilit

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Erik Krogen
+1 on Dongjoon's proposal. This is a very nice compromise between the reflective/magic-method approach and the InternalRow approach, providing a lot of flexibility for our users, and allowing for the more complicated reflection-based approach to evolve at its own pace, since you can always fall bac

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread John Zhuge
+1 Good plan to move forward. Thank you all for the constructive and comprehensive discussions in this thread! Decisions on this important feature will have ramifications for years to come. On Wed, Mar 3, 2021 at 7:42 PM Wenchen Fan wrote: > +1 to this proposal. If people don't like the ScalarF

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Wenchen Fan
+1 to this proposal. If people don't like the ScalarFunction0,1, ... variants and prefer the "magical methods", then we can have a single ScalarFunction interface which has the row-parameter API (with a default implementation to fail) and documents to describe the "magical methods" (which can be do

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Ryan Blue
Good point, Dongjoon. I think we can probably come to some compromise here: - Remove SupportsInvoke since it isn’t really needed. We should always try to find the right method to invoke in the codegen path. - Add a default implementation of produceResult so that implementations don’t h

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Dongjoon Hyun
Hi, All. We shared many opinions in different perspectives. However, we didn't reach a consensus even on a partial merge by excluding something (on the PR by me, on this mailing thread by Wenchen). For the following claims, we have another alternative to mitigate it. > I don't like it becaus

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Ryan Blue
Yes, GenericInternalRow is safe if when type mismatches, with the cost of using Object[], and primitive types need to do boxing The question is not whether to use the magic functions, which would not need boxing. The question here is whether to use multiple ScalarFunction interfaces. Those interfa

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-02 Thread Wenchen Fan
Yes, GenericInternalRow is safe if when type mismatches, with the cost of using Object[], and primitive types need to do boxing. And this is a runtime failure, which is absolutely worse than query-compile-time checks. Also, don't forget my previous point: users need to specify the type and index su

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-01 Thread Ryan Blue
Thanks for adding your perspective, Erik! If the input is string type but the UDF implementation calls row.getLong(0), it returns wrong data I think this is misleading. It is true for UnsafeRow, but there is no reason why InternalRow should return incorrect values. The implementation in GenericI

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-23 Thread Wenchen Fan
gt;>>>> >>>>>>>>>>>>> Andrew, >>>>>>>>>>>>> >>>>>>>>>>>>> The proposal already includes an API for aggregate functions >>>>>>>>>>>>> and I think we would want to

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-23 Thread Dongjoon Hyun
support through optional interfaces like >>>>>>>>>>>> `SupportsInvoke` or `SupportsColumnBatch`. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 16, 2021 at 9:00 AM Andrew Melo < >>>>>>>>>>>> andrew.m.

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-22 Thread Wenchen Fan
> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Thanks for the positive feedback, everyone. It sounds like >>>>>>>>>>>> there is a clear path forward for calling functions. Even without a >>&g

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-21 Thread Walaa Eldin Moustafa
gt;>>>> >> This is an important feature which can unblock several other >>>>>>>>>>> projects including bucket join support for DataSource v2, complete >>>>>>>>>>> support >>>>>>>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-21 Thread Wenchen Fan
;>>>>>>>>> community >>>>>>>>>> will come to a proposal with the best of both sides. >>>>>>>>>> >> >>>>>>>>>> >> Chao >>>>>>>>>> >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-19 Thread Ryan Blue
>>>>>>> 작성: >>>>>>>>> >>>> >>>>>>>>> >>>> Basically I think the proposal makes sense to me and I'd like >>>>>>>>> to support the >>>>>>>>> >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-18 Thread Wenchen Fan
>>> Actually I think the SupportsInvoke proposed by Ryan looks a >>>>>>>> good >>>>>>>> >>>> alternative to me. Besides Wenchen's alternative >>>>>>>> implementation, is there a >>>>>&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-18 Thread Walaa Eldin Moustafa
t;> >>>> > considerations for future optimizations and extensions. Can't >>>>>>> wait to see >>>>>>> >>>> > it leading to more advanced functionalities like views with >>>>>>> shared custom >>>>>>>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-18 Thread Ryan Blue
ype and further discussions including the >>>>>> SupportsInvoke >>>>>> >>>> > extension proposed by Ryan. >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > On Fri, Feb 12, 202

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Wenchen Fan
t; >>>> > >>>>> >>>> >> I think this proposal is a very good thing giving Spark a >>>>> standard way of >>>>> >>>> >> getting to and calling UDFs. >>>>> >>>> >> >>&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Ryan Blue
ell. I >>>> think it >>>> >>>> >> would >>>> >>>> >> also simplify using the functions in other contexts like >>>> pushing down >>>> >>>> >> filters into the ORC & Parquet readers although there are a lot >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Dongjoon Hyun
t;>> > owen.omalley@ >>>>> >>>> >>>>> >>>> > > >>>>> >>>> > wrote: >>>>> >>>> > >>>>> >>>> >> I think this proposal is a very good thing giving Spark a

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Wenchen Fan
gt; >>>> >> simple, yet covers all of the polymorphic type cases well. I >>>> think it >>>> >>>> >> would >>>> >>>> >> also simplify using the functions in other contexts like >>>> pushing down

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Hyukjin Kwon
; .. Owen >>> >>>> >> >>> >>>> >> >>> >>>> >> On Fri, Feb 12, 2021 at 11:07 PM Erik Krogen < >>> >>>> >>> >>>> > ekrogen@.com >>> >>>> >>> >>&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Dongjoon Hyun
;> >> >> >>>> >>> I agree that there is a strong need for a FunctionCatalog within >> Spark >> >>>> >>> to >> >>>> >>> provide support for shareable UDFs, as well as make movement >> towards >> >>>&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Ryan Blue
> >>>> >>> support this SPIP wholeheartedly. > >>>> >>> > >>>> >>> I find both of the proposed UDF APIs to be sufficiently > user-friendly > >>>> >>> and > >>>> >>> extensible. I generally think Wenche

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Andrew Melo
>>> >>>> >>> I find both of the proposed UDF APIs to be sufficiently user-friendly >>>> >>> and >>>> >>> extensible. I generally think Wenchen's proposal is easier for a user >>>> >>> to >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Walaa Eldin Moustafa
rt this SPIP wholeheartedly. >>>> >>> >>>> >>> I find both of the proposed UDF APIs to be sufficiently >>>> user-friendly >>>> >>> and >>>> >>> extensible. I generally think Wenchen's proposa

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ye Xianjin
gt; >>> support this SPIP wholeheartedly. >>>> >>> >>>> >>> I find both of the proposed UDF APIs to be sufficiently user-friendly >>>> >>> and >>>> >>> extensible. I generally think Wenchen's proposal is easie

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ryan Blue
gt;> >>> I find both of the proposed UDF APIs to be sufficiently user-friendly >>> >>> and >>> >>> extensible. I generally think Wenchen's proposal is easier for a >>> user to >>> >>> work with in the common case, but has greater

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Chao Sun
eater potential for confusing >> >>> and >> >>> hard-to-debug behavior due to use of reflective method signature >> >>> searches. >> >>> The merits on both sides can hopefully be more properly examined with >> >>> code, >> &

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Hyukjin Kwon
to > >>> provide > >>> a more concrete comparison. I am optimistic that we will not let the > >>> debate > >>> over this point unreasonably stall the SPIP from making progress. > >>> > >>> Thank you to both Wenchen and Ryan for

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Liang-Chi Hsieh
he SPIP from making progress. >>> >>> Thank you to both Wenchen and Ryan for your detailed consideration and >>> evaluation of these ideas! >>> -- >>> *From:* Dongjoon Hyun < > dongjoon.hyun@ > > >>> *Sent:

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread John Zhuge
is point unreasonably stall the SPIP from making progress. >> >> Thank you to both Wenchen and Ryan for your detailed consideration and >> evaluation of these ideas! >> -- >> *From:* Dongjoon Hyun >> *Sent:* Wednesday, February 10, 2021

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Owen O'Malley
your detailed consideration and > evaluation of these ideas! > -- > *From:* Dongjoon Hyun > *Sent:* Wednesday, February 10, 2021 6:06 PM > *To:* Ryan Blue > *Cc:* Holden Karau ; Hyukjin Kwon < > gurwls...@gmail.com>; Spark Dev List ; Wenchen Fan < > cloud0...@gma

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Erik Krogen
d Ryan for your detailed consideration and evaluation of these ideas! From: Dongjoon Hyun Sent: Wednesday, February 10, 2021 6:06 PM To: Ryan Blue Cc: Holden Karau ; Hyukjin Kwon ; Spark Dev List ; Wenchen Fan Subject: Re: [DISCUSS] SPIP: FunctionCatalog BTW, I for

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Dongjoon Hyun
BTW, I forgot to add my opinion explicitly in this thread because I was on the PR before this thread. 1. The `FunctionCatalog API` PR was made on May 9, 2019 and has been there for almost two years. 2. I already gave my +1 on that PR last Saturday because I agreed with the latest updated design do

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Dongjoon Hyun
Hi, Ryan. We didn't move past anything (both yours and Wenchen's). What Wenchen suggested is double-checking the alternatives with the implementation to give more momentum to our discussion. Your new suggestion about optional extention also sounds like a new reasonable alternative to me. We are

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Ryan Blue
I don’t think we should so quickly move past the drawbacks of this approach. The problems are significant enough that using invoke is not sufficient on its own. But, I think we can add it as an optional extension to shore up the weaknesses. Here’s a summary of the drawbacks: - Magic function s

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Dongjoon Hyun
Thank you all for making a giant move forward for Apache Spark 3.2.0. I'm really looking forward to seeing Wenchen's implementation. That would be greatly helpful to make a decision! > I'll implement my idea after the holiday and then we can have more effective discussions. We can also do benchmar

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
FYI: the Presto UDF API also takes individual parameters instead of the row parameter. I think this direction at least worth a try so that we can see the performance difference. It's also mentioned in the design doc as an alternative (Trino)

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
Hi Holden, As Hyukjin said, following existing designs is not the principle of DS v2 API design. We should make sure the DS v2 API makes sense. AFAIK we didn't fully follow the catalog API design from Hive and I believe Ryan also agrees with it. I think the problem here is we were discussing some

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Hyukjin Kwon
Just dropping a few lines. I remember that one of the goals in DSv2 is to correct the mistakes we made in the current Spark codes. It would not have much point if we will happen to just follow and mimic what Spark currently does. It might just end up with another copy of Spark APIs, e.g. Expression

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Holden Karau
I think this proposal is a good set of trade-offs and has existed in the community for a long period of time. I especially appreciate how the design is focused on a minimal useful component, with future optimizations considered from a point of view of making sure it's flexible, but actual concrete

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Ryan Blue
I don’t think that using Invoke really works. The usability is poor if something goes wrong and it can’t handle varargs or parameterized use cases well. There isn’t a significant enough performance penalty for passing data as a row to justify making the API much more difficult and less expressive.

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
Hi Ryan, Sorry if I didn't make it clear. I was referring to implementing UDF using codegen, not calling the UDF with codegen or not. Calling UDF is Spark's job and it doesn't matter if the UDF API uses row or individual parameters, as you said. My point is, it's a bad idea to follow the Expressio

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Ryan Blue
Wenchen, There are a few issues with the Invoke approach, and I don’t think that it is really much better for the additional complexity of the API. First I think that you’re confusing codegen to call a function with codegen to implement a function. The non-goal refers to supporting codegen to *im

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Wenchen Fan
This is a very important feature, thanks for working on it! Spark uses codegen by default, and it's a bit unfortunate to see that codegen support is treated as a non-goal. I think it's better to not ask the UDF implementations to provide two different code paths for interpreted evaluation and code