Re: Row Encoder For DataSet

2017-12-10 Thread Tomasz Dudek
Hello Sandeep, you can pass Row to UDAF. Just provide a proper inputSchema to your UDAF. Check out this example https://docs.databricks.com/ spark/latest/spark-sql/udaf-scala.html Yours, Tomasz 2017-12-10 11:55 GMT+01:00 Sandip Mehta : > Thanks Georg. I have looked at UADF based on your sugges

Re: Row Encoder For DataSet

2017-12-10 Thread Sandip Mehta
Thanks Georg. I have looked at UADF based on your suggestion. Looks like you can only pass single column to UADF. Is there any way you can pass entire Row to aggregate function? I want to list of user defined function and given row object. Perform the aggregation and return aggregated Row object.

Re: Row Encoder For DataSet

2017-12-07 Thread Georg Heiler
You are looking for an UADF. Sandip Mehta schrieb am Fr. 8. Dez. 2017 um 06:20: > Hi, > > I want to group on certain columns and then for every group wants to apply > custom UDF function to it. Currently groupBy only allows to add aggregation > function to GroupData. > > For this was thinking to

Re: Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
Hi, I want to group on certain columns and then for every group wants to apply custom UDF function to it. Currently groupBy only allows to add aggregation function to GroupData. For this was thinking to use groupByKey which will return KeyValueDataSet and then apply UDF for every group but really

Re: Row Encoder For DataSet

2017-12-07 Thread Weichen Xu
You can groupBy multiple columns on dataframe, so why you need so complicated schema ? suppose df schema: (x, y, u, v, z) df.groupBy($"x", $"y").agg(...) Is this you want ? On Fri, Dec 8, 2017 at 11:51 AM, Sandip Mehta wrote: > Hi, > > During my aggregation I end up having following schema. >

Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
Hi, During my aggregation I end up having following schema. Row(Row(val1,val2), Row(val1,val2,val3...)) val values = Seq( (Row(10, 11), Row(10, 2, 11)), (Row(10, 11), Row(10, 2, 11)), (Row(20, 11), Row(10, 2, 11)) ) 1st tuple is used to group the relevant records for aggregation.