ere
> any other way I can enable broadcast joins between parquet file RDDs in
> Spark Sql?
>
> Thanks
> Dima
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21632.html
>
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in
et file RDDs in
>> Spark Sql?
>>
>> Thanks
>> Dima
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-bro
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in
> Spark Sql?
>
> Thanks
> Dima
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21609.html
> Sent from the Apache Sp
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in
Oh, I found a explanation from
http://cmenguy.github.io/blog/2013/10/30/using-hive-with-parquet-format-in-cdh-4-dot-3/
The error here is a bit misleading, what it really means is that the class
parquet.hive.DeprecatedParquetOutputFormat isn’t in the classpath for Hive.
Sure enough, doing a ls /usr
Hi,
Looks like the latest SparkSQL with Hive 0.12 has a bug in Parquet support.
I got the following exceptions:
org.apache.hadoop.hive.ql.parse.SemanticException: Output Format must
implement HiveOutputFormat, otherwise it should be either
IgnoreKeyTextOutputFormat or SequenceFileOutputFormat
It works fine, thanks for the help Michael.
Liancheng also told me a trick, using a subquery with LIMIT n. It works in
latest 1.2.0
BTW, looks like the broadcast optimization won't be recognized if I do a
left join instead of a inner join. Is that true? How can I make it work for
left joins?
Che
Thanks for the input. We purposefully made sure that the config option did
not make it into a release as it is not something that we are willing to
support long term. That said we'll try and make this easier in the future
either through hints or better support for statistics.
In this particular
Ok, currently there's cost-based optimization however Parquet statistics is
not implemented...
What's the good way if I want to join a big fact table with several tiny
dimension tables in Spark SQL (1.1)?
I wish we can allow user hint for the join.
Jianshi
On Wed, Oct 8, 2014 at 2:18 PM, Jiansh
Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged
into master?
I cannot find spark.sql.hints.broadcastTables in latest master, but it's in
the following patch.
https://github.com/apache/spark/commit/76ca4341036b95f71763f631049fdae033990ab5
Jianshi
On Mon, Sep 29, 2014
Yes, looks like it can only be controlled by the
parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird
to me.
How am I suppose to know the exact bytes of a table? Let me specify the
join algorithm is preferred I think.
Jianshi
On Sun, Sep 28, 2014 at 11:57 PM, Ted Yu wrote
Have you looked at SPARK-1800 ?
e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
Cheers
On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang
wrote:
> I cannot find it in the documentation. And I have a dozen dimension tables
> to (left) join...
>
>
> Cheers,
> --
> Jianshi Huang
I cannot find it in the documentation. And I have a dozen dimension tables
to (left) join...
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
16 matches
Mail list logo