I'm not sure if Spark 1.6 is still maintained, can you try a 2.x spark version and see if the problem still exists?
On Sun, Sep 30, 2018 at 4:14 PM 白也诗无敌 <445484...@qq.com> wrote: > Besides I have tried ANALYZE statement. It has no use cause I need the > single partition but get the total table size by hive parameter 'totalSize' > or 'rawSize' and so on > > > > > Hi, guys: > I'm using Spark1.6.2. > There are two tables and the small one is a partitioned parquet table; > The total size of the small table is 1000M but each partition only 1M; > When I set spark.sql.autoBroadcastJoinThreshold to 50m and join the > two tables with single partition, I get the SortMergeJoin physical plan. > I have made some try and it has something to do with the partition > pruning: > 1. check the physical plan, and all of the partitions of the small > table are added in. > It seems like https://issues.apache.org/jira/browse/SPARK-16980 > > 2. set spark.sql.hive.convertMetastoreParquet=false > The pruning is success, but still get SortMergeJoin because the code > HiveMetastoreCatalog.scala > @transient override lazy val statistics: Statistics = Statistics( > > sizeInBytes = { > val totalSize = hiveQlTable.getParameters.get(StatsSetupConst.TOTAL_SIZE) > val rawDataSize = > hiveQlTable.getParameters.get(StatsSetupConst.RAW_DATA_SIZE) > > The total size of the table not the single partition. > > How can I fix this without patches? Or Is there a patch for SPARK1.6 > about SPARK-16980. > > > best regards! > Jerry >