Hi, You should take a look at https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
Spark SQL does not directly support the aggregation of multiple distinct groups. For example select count(distinct a), count(distinct b) from tbl_x containts distinct groups a & b. The RewriteDistinctAggregates rewrites this into an two aggregates, the first aggregate takes care of deduplication and the second aggregate does the actual aggregation. HTH On Sun, Nov 13, 2016 at 11:46 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I might not have been there yet, but since I'm with the code every day > I might be close... > > When you say "aggregate functions", are you about typed or untyped > ones? Just today I reviewed the typed ones and honestly took me some > time to figure out what belongs to where. Are you creating a new UDAF? > What have you done already? GitHub perhaps? > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sun, Nov 13, 2016 at 12:03 PM, assaf.mendelson > <assaf.mendel...@rsa.com> wrote: > > Hi, > > > > I am trying to understand how aggregate functions are implemented > > internally. > > > > I see that the expression is wrapped using toAggregateExpression using > > isDistinct. > > > > I can’t figure out where the code that makes the data distinct is > located. I > > am trying to figure out how the input data is converted into a distinct > > version. > > > > Thanks, > > > > Assaf. > > > > > > ________________________________ > > View this message in context: how does isDistinct work on expressions > > Sent from the Apache Spark Developers List mailing list archive at > > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >