Hi,

You should take a look at
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala

Spark SQL does not directly support the aggregation of multiple distinct
groups. For example select count(distinct a), count(distinct b) from
tbl_x containts
distinct groups a  & b. The RewriteDistinctAggregates rewrites this into an
two aggregates, the first aggregate takes care of deduplication and the
second aggregate does the actual aggregation.

HTH

On Sun, Nov 13, 2016 at 11:46 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I might not have been there yet, but since I'm with the code every day
> I might be close...
>
> When you say "aggregate functions", are you about typed or untyped
> ones? Just today I reviewed the typed ones and honestly took me some
> time to figure out what belongs to where. Are you creating a new UDAF?
> What have you done already? GitHub perhaps?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sun, Nov 13, 2016 at 12:03 PM, assaf.mendelson
> <assaf.mendel...@rsa.com> wrote:
> > Hi,
> >
> > I am trying to understand how aggregate functions are implemented
> > internally.
> >
> > I see that the expression is wrapped using toAggregateExpression using
> > isDistinct.
> >
> > I can’t figure out where the code that makes the data distinct is
> located. I
> > am trying to figure out how the input data is converted into a distinct
> > version.
> >
> > Thanks,
> >
> >                 Assaf.
> >
> >
> > ________________________________
> > View this message in context: how does isDistinct work on expressions
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to