Hi Koert & Robin , * Thanks ! *But if you go through the blog https://bzhangusc.wordpress.co m/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/ and check the comments under the blog it's actually working, although I am not sure how . And yes I agree a custom aggregate UDAF is a good option .
Can anyone share the best way to implement this in Spark .? Regards, Rabin Banerjee On Thu, Nov 3, 2016 at 6:59 PM, Koert Kuipers <ko...@tresata.com> wrote: > Just realized you only want to keep first element. You can do this without > sorting by doing something similar to min or max operation using a custom > aggregator/udaf or reduceGroups on Dataset. This is also more efficient. > > On Nov 3, 2016 7:53 AM, "Rabin Banerjee" <dev.rabin.baner...@gmail.com> > wrote: > >> Hi All , >> >> I want to do a dataframe operation to find the rows having the latest >> timestamp in each group using the below operation >> >> df.orderBy(desc("transaction_date")).groupBy("mobileno").agg(first("customername").as("customername"),first("service_type").as("service_type"),first("cust_addr").as("cust_abbr")) >> .select("customername","service_type","mobileno","cust_addr") >> >> >> *Spark Version :: 1.6.x* >> >> My Question is *"Will Spark guarantee the Order while doing the groupBy , if >> DF is ordered using OrderBy previously in Spark 1.6.x"??* >> >> >> *I referred a blog here :: >> **https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/ >> >> <https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/>* >> >> *Which claims it will work except in Spark 1.5.1 and 1.5.2 .* >> >> >> *I need a bit elaboration of how internally spark handles it ? also is it >> more efficient than using a Window function ?* >> >> >> *Thanks in Advance ,* >> >> *Rabin Banerjee* >> >> >> >>