Not really, a grouped DataFrame only provides SQL-like functions like sum and avg (at least in 1.5).
> On 29.08.2016, at 14:56, ayan guha <guha.a...@gmail.com> wrote: > > If you are confused because of the name of two APIs. I think DF API name > groupBy came from SQL, but it works similarly as reducebykey. > > On 29 Aug 2016 20:57, "Marius Soutier" <mps....@gmail.com > <mailto:mps....@gmail.com>> wrote: > In DataFrames (and thus in 1.5 in general) this is not possible, correct? > >> On 11.08.2016, at 05:42, Holden Karau <hol...@pigscanfly.ca >> <mailto:hol...@pigscanfly.ca>> wrote: >> >> Hi Luis, >> >> You might want to consider upgrading to Spark 2.0 - but in Spark 1.6.2 you >> can do groupBy followed by a reduce on the GroupedDataset ( >> http://spark.apache.org/docs/1.6.2/api/scala/index.html#org.apache.spark.sql.GroupedDataset >> >> <http://spark.apache.org/docs/1.6.2/api/scala/index.html#org.apache.spark.sql.GroupedDataset> >> ) - this works on a per-key basis despite the different name. In Spark 2.0 >> you would use groupByKey on the Dataset followed by reduceGroups ( >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset >> >> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset> >> ). >> >> Cheers, >> >> Holden :) >> >> On Wed, Aug 10, 2016 at 5:15 PM, luismattor <luismat...@gmail.com >> <mailto:luismat...@gmail.com>> wrote: >> Hi everyone, >> >> Consider the following code: >> >> val result = df.groupBy("col1").agg(min("col2")) >> >> I know that rdd.reduceByKey(func) produces the same RDD as >> rdd.groupByKey().mapValues(value => value.reduce(func)) However reducerByKey >> is more efficient as it avoids shipping each value to the reducer doing the >> aggregation (it ships partial aggregations instead). >> >> I wonder whether the DataFrame API optimizes the code doing something >> similar to what RDD.reduceByKey does. >> >> I am using Spark 1.6.2. >> >> Regards, >> Luis >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-reduceByKey-functionality-in-DataFrame-API-tp27508.html >> >> <http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-reduceByKey-functionality-in-DataFrame-API-tp27508.html> >> Sent from the Apache Spark User List mailing list archive at Nabble.com >> <http://nabble.com/>. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> >> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau <https://twitter.com/holdenkarau>