[
https://issues.apache.org/jira/browse/SPARK-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616376#comment-15616376
]
Franck Tago commented on SPARK-17612:
-------------------------------------
Hi
Basically I have an issue where I am performing the following operations.
Partitioned Large Hive Table (hive table 1) -- filter --- join
/
Non Partitioned Large Hive Table
Basically I am join 2 large tables . Both table raw size exceed the broadcast
join threshold.
The filter filter a specific partition . This partition is small enough so that
its size is smaller than the broadcast join threshold.
With Spark 2.0 and Spark 2.0.1 , I do not see a broadcast join . I see a sort
merge join.
Which is really surprising to me given that this could be a really common
case. You can imagine a user who has a large log table partitioned by date and
he filters on a specific date. We should be able to do a broadcast join in that
case.
The question now is the following .
I do not think this Spark Issue addresses the cited problem but I could be
wrong . I tried incorporating the change in the spark 2.0 PR but I see the
same behavior . That is no broadcast join.
Question : Is this spark issue supposed to address the problem that I
mentioned ?
- If not , which i think is the case , do you know if spark currently has a
fix for the cited issue.
I also tried the fix under SPARK-15616 but I hit a runtime failure .
There has got to be a solution to this problem somewhere.
> Support `DESCRIBE table PARTITION` SQL syntax
> ---------------------------------------------
>
> Key: SPARK-17612
> URL: https://issues.apache.org/jira/browse/SPARK-17612
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Dongjoon Hyun
> Assignee: Dongjoon Hyun
> Fix For: 2.0.2, 2.1.0
>
>
> This issue implements `DESC PARTITION` SQL Syntax again. It was dropped since
> Spark 2.0.0.
> h4. Spark 2.0.0
> {code}
> scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY
> (c STRING, d STRING)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
> res1: org.apache.spark.sql.DataFrame = []
> scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
> org.apache.spark.sql.catalyst.parser.ParseException:
> Unsupported SQL statement
> == SQL ==
> DESC partitioned_table PARTITION (c='Us', d=1)
> at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:58)
> at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
> at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:82)
> at
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
> at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:573)
> ... 48 elided
> {code}
> h4. Spark 1.6.2
> {code}
> scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY
> (c STRING, d STRING)")
> res1: org.apache.spark.sql.DataFrame = [result: string]
> scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
> res2: org.apache.spark.sql.DataFrame = [result: string]
> scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
> 16/09/20 12:48:36 WARN LazyStruct: Extra bytes detected at the end of the
> row! Ignoring similar problems.
> +----------------------------------------------------------------+
> |result |
> +----------------------------------------------------------------+
> |a string |
> |b int |
> |c string |
> |d string |
> | |
> |# Partition Information
> |
> |# col_name data_type comment |
> | |
> |c string |
> |d string |
> +----------------------------------------------------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]