hive client.getAllPartitions in lookupRelation can take a very long time

chutium Tue, 02 Sep 2014 09:33:26 -0700

in our hive warehouse there are many tables with a lot of partitions, such as
scala> hiveContext.sql("use db_external")
scala> val result = hiveContext.sql("show partitions et_fullorders").count
result: Long = 5879


i noticed that this part of code:
https://github.com/apache/spark/blob/9d006c97371ddf357e0b821d5c6d1535d9b6fe41/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L55-L56

reads the whole partitions info at the beginning of plan phase, i added a
logInfo around this val partitions = ...

it shows:

scala> val result = hiveContext.sql("select * from db_external.et_fullorders
limit 5")
14/09/02 16:15:56 INFO ParseDriver: Parsing command: select * from
db_external.et_fullorders limit 5
14/09/02 16:15:56 INFO ParseDriver: Parse Completed
14/09/02 16:15:56 INFO HiveContext$$anon$1: getAllPartitionsForPruner
started
14/09/02 16:17:35 INFO HiveContext$$anon$1: getAllPartitionsForPruner
finished

it took about 2min to get all partitions...

is there any possible way to avoid this operation? such as only fetch the
requested partition somehow?

Thanks



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/hive-client-getAllPartitions-in-lookupRelation-can-take-a-very-long-time-tp8186.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

hive client.getAllPartitions in lookupRelation can take a very long time

Reply via email to