Thanks for the pointer. It makes more sense now.
Assaf.

From: Herman van Hövell tot Westerflier-2 [via Apache Spark Developers List] 
[mailto:ml-node+s1001551n19842...@n3.nabble.com]
Sent: Sunday, November 13, 2016 10:03 PM
To: Mendelson, Assaf
Subject: Re: how does isDistinct work on expressions

Hi,

You should take a look at 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala

Spark SQL does not directly support the aggregation of multiple distinct 
groups. For example select count(distinct a), count(distinct b) from tbl_x 
containts distinct groups a  & b. The RewriteDistinctAggregates rewrites this 
into an two aggregates, the first aggregate takes care of deduplication and the 
second aggregate does the actual aggregation.

HTH

On Sun, Nov 13, 2016 at 11:46 AM, Jacek Laskowski <[hidden 
email]</user/SendEmail.jtp?type=node&node=19842&i=0>> wrote:
Hi,

I might not have been there yet, but since I'm with the code every day
I might be close...

When you say "aggregate functions", are you about typed or untyped
ones? Just today I reviewed the typed ones and honestly took me some
time to figure out what belongs to where. Are you creating a new UDAF?
What have you done already? GitHub perhaps?

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Nov 13, 2016 at 12:03 PM, assaf.mendelson
<[hidden email]</user/SendEmail.jtp?type=node&node=19842&i=1>> wrote:
> Hi,
>
> I am trying to understand how aggregate functions are implemented
> internally.
>
> I see that the expression is wrapped using toAggregateExpression using
> isDistinct.
>
> I can’t figure out where the code that makes the data distinct is located. I
> am trying to figure out how the input data is converted into a distinct
> version.
>
> Thanks,
>
>                 Assaf.
>
>
> ________________________________
> View this message in context: how does isDistinct work on expressions
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden 
email]</user/SendEmail.jtp?type=node&node=19842&i=2>


________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/how-does-isDistinct-work-on-expressions-tp19836p19842.html
To start a new topic under Apache Spark Developers List, email 
ml-node+s1001551n1...@n3.nabble.com<mailto:ml-node+s1001551n1...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click 
here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/how-does-isDistinct-work-on-expressions-tp19836p19847.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to