Not so sure about your question, but the SparkStrategies.scala and Optimizer.scala is a good start if you want to get details of the join implementation or optimization.
-----Original Message----- From: Andrew Ash [mailto:and...@andrewash.com] Sent: Friday, January 16, 2015 4:52 AM To: Reynold Xin Cc: Alessandro Baretta; dev@spark.apache.org Subject: Re: Join implementation in SparkSQL What Reynold is describing is a performance optimization in implementation, but the semantics of the join (cartesian product plus relational algebra filter) should be the same and produce the same results. On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin <r...@databricks.com> wrote: > It's a bunch of strategies defined here: > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/or > g/apache/spark/sql/execution/SparkStrategies.scala > > In most common use cases (e.g. inner equi join), filters are pushed > below the join or into the join. Doing a cartesian product followed by > a filter is too expensive. > > > On Thu, Jan 15, 2015 at 7:39 AM, Alessandro Baretta > <alexbare...@gmail.com > > > wrote: > > > Hello, > > > > Where can I find docs about how joins are implemented in SparkSQL? > > In particular, I'd like to know whether they are implemented > > according to their relational algebra definition as filters on top > > of a cartesian product. > > > > Thanks, > > > > Alex > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org