Thanks for the pointer. It makes more sense now. Assaf. From: Herman van Hövell tot Westerflier-2 [via Apache Spark Developers List] [mailto:ml-node+s1001551n19842...@n3.nabble.com] Sent: Sunday, November 13, 2016 10:03 PM To: Mendelson, Assaf Subject: Re: how does isDistinct work on expressions
Hi, You should take a look at https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala Spark SQL does not directly support the aggregation of multiple distinct groups. For example select count(distinct a), count(distinct b) from tbl_x containts distinct groups a & b. The RewriteDistinctAggregates rewrites this into an two aggregates, the first aggregate takes care of deduplication and the second aggregate does the actual aggregation. HTH On Sun, Nov 13, 2016 at 11:46 AM, Jacek Laskowski <[hidden email]</user/SendEmail.jtp?type=node&node=19842&i=0>> wrote: Hi, I might not have been there yet, but since I'm with the code every day I might be close... When you say "aggregate functions", are you about typed or untyped ones? Just today I reviewed the typed ones and honestly took me some time to figure out what belongs to where. Are you creating a new UDAF? What have you done already? GitHub perhaps? Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Nov 13, 2016 at 12:03 PM, assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19842&i=1>> wrote: > Hi, > > I am trying to understand how aggregate functions are implemented > internally. > > I see that the expression is wrapped using toAggregateExpression using > isDistinct. > > I can’t figure out where the code that makes the data distinct is located. I > am trying to figure out how the input data is converted into a distinct > version. > > Thanks, > > Assaf. > > > ________________________________ > View this message in context: how does isDistinct work on expressions > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]</user/SendEmail.jtp?type=node&node=19842&i=2> ________________________________ If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/how-does-isDistinct-work-on-expressions-tp19836p19842.html To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1...@n3.nabble.com<mailto:ml-node+s1001551n1...@n3.nabble.com> To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>. NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/how-does-isDistinct-work-on-expressions-tp19836p19847.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.