spark.sql.hive.convertMetastoreParquet is true. I can't repro the issue of
scanning all partitions now.. : P
Anyway, I found another email thread "Re: Spark Sql behaves strangely with
tables with a lot of partitions"
I observe the same issue as Jerrick, spark driver will call listStatus
for the w
Is there any chance that " spark.sql.hive.convertMetastoreParquet" is
turned off?
Cheng
On 11/4/15 5:15 PM, Rex Xiong wrote:
Thanks Cheng Lian.
I found in 1.5, if I use spark to create this table with partition
discovery, the partition pruning can be performed, but for my old
table definitio
Thanks Cheng Lian.
I found in 1.5, if I use spark to create this table with partition
discovery, the partition pruning can be performed, but for my old table
definition in pure Hive, the execution plan will do a parquet scan across
all partitions, and it runs very slow.
Looks like the execution pla
SPARK-11153 should be irrelevant because you are filtering on a
partition key while SPARK-11153 is about Parquet filter push-down and
doesn't affect partition pruning.
Cheng
On 11/3/15 7:14 PM, Rex Xiong wrote:
We found the query performance is very poor due to this issue
https://issues.apac
We found the query performance is very poor due to this issue
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-11153
We usually use filter on partition key, the date, it's in string type in
1.3.1 and works great.
But in 1.5, it needs to do parquet scan for all partitions.
2015年10月3
Add back this thread to email list, forgot to reply all.
2015年10月31日 下午7:23,"Michael Armbrust" 写道:
> Not that I know of.
>
> On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong wrote:
>
>> Good to know that, will have a try.
>> So there is no easy way to achieve it in pure hive method?
>> 2015年10月31日 下午7
What Storage Format?
> On 30 Oct 2015, at 12:05, Rex Xiong wrote:
>
> Hi folks,
>
> I have a Hive external table with partitions.
> Every day, an App will generate a new partition day=-MM-dd stored by
> parquet and run add-partition Hive command.
> In some cases, we will add additional c
>
> We have tried schema merging feature, but it's too slow, there're hundreds
> of partitions.
>
Which version of Spark?
Hi folks,
I have a Hive external table with partitions.
Every day, an App will generate a new partition day=-MM-dd stored by
parquet and run add-partition Hive command.
In some cases, we will add additional column to new partitions and update
Hive table schema, then a query across new and old