in our hive warehouse there are many tables with a lot of partitions, such as
scala> hiveContext.sql("use db_external")
scala> val result = hiveContext.sql("show partitions et_fullorders").count
result: Long = 5879i noticed that this part of code: https://github.com/apache/spark/blob/9d006c97371ddf357e0b821d5c6d1535d9b6fe41/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L55-L56 reads the whole partitions info at the beginning of plan phase, i added a logInfo around this val partitions = ... it shows: scala> val result = hiveContext.sql("select * from db_external.et_fullorders limit 5") 14/09/02 16:15:56 INFO ParseDriver: Parsing command: select * from db_external.et_fullorders limit 5 14/09/02 16:15:56 INFO ParseDriver: Parse Completed 14/09/02 16:15:56 INFO HiveContext$$anon$1: getAllPartitionsForPruner started 14/09/02 16:17:35 INFO HiveContext$$anon$1: getAllPartitionsForPruner finished it took about 2min to get all partitions... is there any possible way to avoid this operation? such as only fetch the requested partition somehow? Thanks -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/hive-client-getAllPartitions-in-lookupRelation-can-take-a-very-long-time-tp8186.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
