Re: Hive Table with large number of partitions

Michael Armbrust Fri, 17 Jul 2015 07:00:45 -0700

https://github.com/apache/spark/pull/7421


On Fri, Jul 17, 2015 at 3:26 AM, Xiaoyu Ma <hzmaxia...@corp.netease.com>
wrote:

> Hi guys,
> I saw when Hive Table object created it tries to load all existing
> partitions.
>
>
> @transient val hiveQlPartitions: Seq[Partition] = table.getAllPartitions.map 
> { p =>
>   val tPartition = new org.apache.hadoop.hive.metastore.api.Partition
>   tPartition.setDbName(databaseName)
>   tPartition.setTableName(tableName)
>   tPartition.setValues(p.values)
>
> Above code for getAllPartitions costs minutes, tons of memory and failed
> almost always on table with large partitions. This made it useless in our
> use case.
> I think this should be kind of lazy loading or can be postpone till
> partition pruning stage. Not sure why we need full partition info at this
> stage.
> Any existing ticket to solve this?
>
> 马晓宇 / Xiaoyu Ma
> hzmaxia...@corp.netease.com
>
>
>
>
>

Re: Hive Table with large number of partitions

Reply via email to