I have used percentile_approx (with 0.5) function from hive,using sqlContext sql commands.
On Tue, Sep 27, 2016 at 10:52 AM, Peter Figliozzi <pete.figlio...@gmail.com> wrote: > I'm trying to figure out a nice way to get the median of a DataFrame > column *once it is grouped. * > > It's easy enough now to get the min, max, mean, and other things that are > part of spark.sql.functions: > > df.groupBy("foo", "bar").agg(mean($"column1")) > > And it's easy enough to get the median of a column before grouping, using > approxQuantile. > > However approxQuantile is part of DataFrame.stat i.e. a > DataFrameStatFunctions. > > Is there a way to use it inside the .agg? > > Or do we need a user defined aggregation function? > > Or some other way? > Stack Overflow version of the question here > <http://stackoverflow.com/questions/39693730/median-of-groups-in-a-dataframe-spark-2-0> > . > > Thanks, > > Pete > > -- Best Regards, Ayan Guha