Re: median of groups

ayan guha Mon, 26 Sep 2016 18:23:28 -0700

I have used percentile_approx (with 0.5) function from hive,using
sqlContext sql commands.


On Tue, Sep 27, 2016 at 10:52 AM, Peter Figliozzi <pete.figlio...@gmail.com>
wrote:

> I'm trying to figure out a nice way to get the median of a DataFrame
> column *once it is grouped.  *
>
> It's easy enough now to get the min, max, mean, and other things that are
> part of spark.sql.functions:
>
> df.groupBy("foo", "bar").agg(mean($"column1"))
>
> And it's easy enough to get the median of a column before grouping, using
> approxQuantile.
>
> However approxQuantile is part of DataFrame.stat i.e. a
> DataFrameStatFunctions.
>
> Is there a way to use it inside the .agg?
>
> Or do we need a user defined aggregation function?
>
> Or some other way?
> Stack Overflow version of the question here
> <http://stackoverflow.com/questions/39693730/median-of-groups-in-a-dataframe-spark-2-0>
> .
>
> Thanks,
>
> Pete
>
>


-- 
Best Regards,
Ayan Guha

Re: median of groups

Reply via email to