[ 
https://issues.apache.org/jira/browse/SPARK-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616376#comment-15616376
 ] 

Franck Tago commented on SPARK-17612:
-------------------------------------

Hi  
Basically  I have an issue where I am performing the following operations.

Partitioned Large  Hive Table (hive table 1)  --  filter   ---    join 
                                                                                
      /    
                                 Non Partitioned  Large Hive Table

Basically I am join 2 large tables .  Both table raw size exceed the  broadcast 
join threshold.
The filter filter a specific partition . This partition is small enough so that 
its size is smaller than the broadcast join threshold.

With Spark 2.0 and Spark 2.0.1 , I do not see  a broadcast join . I see a  sort 
merge join.  
Which is really  surprising to me given that this could be a really common  
case. You can imagine a user who has a large log table partitioned by date and 
he filters on a specific date. We should be able to do a broadcast join in that 
case. 

The question now is the following .  

I do not think this Spark Issue addresses the cited problem but I could be 
wrong  . I tried incorporating the change in the spark 2.0 PR but I see the 
same behavior . That is no broadcast join.  

Question :  Is this spark issue supposed to address the problem that I 
mentioned ?  

- If not  , which i think is the case , do you know if spark currently has a 
fix for the cited issue.  
I also tried the fix under   SPARK-15616 but I hit a runtime failure .

There has got to be a solution to this problem somewhere.





> Support `DESCRIBE table PARTITION` SQL syntax
> ---------------------------------------------
>
>                 Key: SPARK-17612
>                 URL: https://issues.apache.org/jira/browse/SPARK-17612
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>             Fix For: 2.0.2, 2.1.0
>
>
> This issue implements `DESC PARTITION` SQL Syntax again. It was dropped since 
> Spark 2.0.0.
> h4. Spark 2.0.0
> {code}
> scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY 
> (c STRING, d STRING)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
> res1: org.apache.spark.sql.DataFrame = []
> scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
> org.apache.spark.sql.catalyst.parser.ParseException:
> Unsupported SQL statement
> == SQL ==
> DESC partitioned_table PARTITION (c='Us', d=1)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:58)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:82)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:573)
>   ... 48 elided
> {code}
> h4. Spark 1.6.2
> {code}
> scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY 
> (c STRING, d STRING)")
> res1: org.apache.spark.sql.DataFrame = [result: string]
> scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
> res2: org.apache.spark.sql.DataFrame = [result: string]
> scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
> 16/09/20 12:48:36 WARN LazyStruct: Extra bytes detected at the end of the 
> row! Ignoring similar problems.
> +----------------------------------------------------------------+
> |result                                                          |
> +----------------------------------------------------------------+
> |a                      string                                        |
> |b                      int                                           |
> |c                      string                                        |
> |d                      string                                        |
> |                                                                            |
> |# Partition Information                                                      
> |
> |# col_name             data_type               comment             |
> |                                                                            |
> |c                      string                                        |
> |d                      string                                        |
> +----------------------------------------------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to