subject:"How to do broadcast join in SparkSQL"

Re: How to do broadcast join in SparkSQL

2014-11-25 Thread Jianshi Huang

Oh, I found a explanation from http://cmenguy.github.io/blog/2013/10/30/using-hive-with-parquet-format-in-cdh-4-dot-3/ The error here is a bit misleading, what it really means is that the class parquet.hive.DeprecatedParquetOutputFormat isn’t in the classpath for Hive. Sure enough, doing a ls /usr

Re: How to do broadcast join in SparkSQL

2014-11-25 Thread Jianshi Huang

Hi, Looks like the latest SparkSQL with Hive 0.12 has a bug in Parquet support. I got the following exceptions: org.apache.hadoop.hive.ql.parse.SemanticException: Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat

Re: How to do broadcast join in SparkSQL

2014-10-10 Thread Jianshi Huang

It works fine, thanks for the help Michael. Liancheng also told me a trick, using a subquery with LIMIT n. It works in latest 1.2.0 BTW, looks like the broadcast optimization won't be recognized if I do a left join instead of a inner join. Is that true? How can I make it work for left joins? Che

Re: How to do broadcast join in SparkSQL

2014-10-08 Thread Michael Armbrust

Thanks for the input. We purposefully made sure that the config option did not make it into a release as it is not something that we are willing to support long term. That said we'll try and make this easier in the future either through hints or better support for statistics. In this particular

Re: How to do broadcast join in SparkSQL

2014-10-08 Thread Jianshi Huang

Ok, currently there's cost-based optimization however Parquet statistics is not implemented... What's the good way if I want to join a big fact table with several tiny dimension tables in Spark SQL (1.1)? I wish we can allow user hint for the join. Jianshi On Wed, Oct 8, 2014 at 2:18 PM, Jiansh

Re: How to do broadcast join in SparkSQL

2014-10-07 Thread Jianshi Huang

Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged into master? I cannot find spark.sql.hints.broadcastTables in latest master, but it's in the following patch. https://github.com/apache/spark/commit/76ca4341036b95f71763f631049fdae033990ab5 Jianshi On Mon, Sep 29, 2014

Re: How to do broadcast join in SparkSQL

Re: How to do broadcast join in SparkSQL

Re: How to do broadcast join in SparkSQL

Re: How to do broadcast join in SparkSQL

Re: How to do broadcast join in SparkSQL

Re: How to do broadcast join in SparkSQL

6 matches

Site Navigation

Mail list logo

Footer information