Ok, currently there's cost-based optimization however Parquet statistics is not implemented...
What's the good way if I want to join a big fact table with several tiny dimension tables in Spark SQL (1.1)? I wish we can allow user hint for the join. Jianshi On Wed, Oct 8, 2014 at 2:18 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged > into master? > > I cannot find spark.sql.hints.broadcastTables in latest master, but it's > in the following patch. > > > https://github.com/apache/spark/commit/76ca4341036b95f71763f631049fdae033990ab5 > > > Jianshi > > > On Mon, Sep 29, 2014 at 1:24 AM, Jianshi Huang <jianshi.hu...@gmail.com> > wrote: > >> Yes, looks like it can only be controlled by the >> parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird >> to me. >> >> How am I suppose to know the exact bytes of a table? Let me specify the >> join algorithm is preferred I think. >> >> Jianshi >> >> On Sun, Sep 28, 2014 at 11:57 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Have you looked at SPARK-1800 ? >>> >>> e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala >>> Cheers >>> >>> On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> I cannot find it in the documentation. And I have a dozen dimension >>>> tables to (left) join... >>>> >>>> >>>> Cheers, >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/