Thanks Fabian We are going to replace all scalar functions with the table functions.
Thanks On Wed, Apr 4, 2018 at 12:16 PM, Fabian Hueske <[email protected]> wrote: > Hi Darshan, > > What you observe is the result of what's supposed to be an optimization. > By fusing the two select() calls, we reduce the number of operators in the > resulting plan (one MapFunction less). > This optimization is only applied for ScalarFunctions but not for > TableFunctions. > With a better cost-modeling that increases the cost of user-defined > ScalarFunctions it should be possible to "convince" the optimizer to not > fuse the operators. > > For now, you could either use the TableFunction approach or convert the > result of the first select() into a DataStream (or DataSet depending on > your setup) and register that again as a table. > That would split the plan into two parts which are independently optimized > and hence the select() operators would not be merged. > > Best, Fabian > > > > 2018-03-30 22:07 GMT+02:00 Darshan Singh <[email protected]>: > >> Hi, >> >> I am not able to find what is best way to query the output of a scalar >> table function. >> >> Suppose I have table which has column col1 which is string. >> >> I have a scalar function and returns a POJO >> {col1V1 String, col1V2 String , col1V3 String}. >> >> >> I am using following. >> >> so table.select("sf(col1) as sfc") >> .select("sfc.get('col1V1') as v1, sfc.get('col1V2') as v2 , >> sfc.get('col1V13') as v3 ") >> >> It is working fine but strangely enough it calls the sf 3 times(each per >> get) for same row. However, If I use table function it calls just once. >> >> So to me it seems that scalar function is very expensive so if I have 10 >> columns and the computation is expensive it will do 10 times. Thats why I >> thought maybe there is a better way to return pojo from scalar function >> rather than what I have been using. >> >> If this is the best way then I wonder if scalar functions should be used >> to return single value only. >> >> Thanks >> > >
