Re: How to get recent value in spark dataframe

2016-12-20 Thread Divya Gehlot
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html Hope this helps Thanks, Divya On 15 December 2016 at 12:49, Milin korath wrote: > Hi > > I have a spark data frame with following structure > > id flag price date > a 0100 2015 > a 050

Re: How to get recent value in spark dataframe

2016-12-19 Thread ayan guha
You have 2 parts to it 1. Do a sub query where for each primary key derive latest value of flag=1 records. Ensure you get exactly 1 record per primary key value. Here you can use rank() over (partition by primary key order by year desc) 2. Join your original dataset with the above on primary key.

Re: How to get recent value in spark dataframe

2016-12-18 Thread Richard Xin
I am not sure I understood your logic, but it seems to me that you could take a look of Hive's Lead/Lag functions. On Monday, December 19, 2016 1:41 AM, Milin korath wrote: thanks, I tried with left outer join. My dataset having around 400M records and lot of shuffling is happening.Is

Re: How to get recent value in spark dataframe

2016-12-18 Thread Milin korath
thanks, I tried with left outer join. My dataset having around 400M records and lot of shuffling is happening.Is there any other workaround apart from Join,I tried use window function but I am not getting a proper solution, Thanks On Sat, Dec 17, 2016 at 4:55 AM, Michael Armbrust wrote: > Oh a

Re: How to get recent value in spark dataframe

2016-12-16 Thread Michael Armbrust
Oh and to get the null for missing years, you'd need to do an outer join with a table containing all of the years you are interested in. On Fri, Dec 16, 2016 at 3:24 PM, Michael Armbrust wrote: > Are you looking for argmax? Here is an example >

Re: How to get recent value in spark dataframe

2016-12-16 Thread Michael Armbrust
Are you looking for argmax? Here is an example . On Wed, Dec 14, 2016 at 8:49 PM, Milin korath wrote: > Hi > > I have a spark data fram

Re: How to get recent value in spark dataframe

2016-12-16 Thread vaquar khan
Not sure about your logic 0 and 1 but you can use orderBy the data according to time and get the first value. Regards, Vaquar khan On Wed, Dec 14, 2016 at 10:49 PM, Milin korath wrote: > Hi > > I have a spark data frame with following structure > > id flag price date > a 0100 2015 >