Re: Issue of Hive parquet partitioned table schema mismatch

Cheng Lian Wed, 04 Nov 2015 16:50:58 -0800

Is there any chance that " spark.sql.hive.convertMetastoreParquet" isturned off?


Cheng


On 11/4/15 5:15 PM, Rex Xiong wrote:

Thanks Cheng Lian.

I found in 1.5, if I use spark to create this table with partitiondiscovery, the partition pruning can be performed, but for my oldtable definition in pure Hive, the execution plan will do a parquetscan across all partitions, and it runs very slow.

Looks like the execution plan optimization is different.

2015-11-03 23:10 GMT+08:00 Cheng Lian <[email protected]<mailto:[email protected]>>:


    SPARK-11153 should be irrelevant because you are filtering on a
    partition key while SPARK-11153 is about Parquet filter push-down
    and doesn't affect partition pruning.

    Cheng


    On 11/3/15 7:14 PM, Rex Xiong wrote:


    We found the query performance is very poor due to this issue
    https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-11153
    We usually use filter on partition key, the date, it's in string
    type in 1.3.1 and works great.
    But in 1.5, it needs to do parquet scan for all partitions.

    2015年10月31日 下午7:38，"Rex Xiong" <[email protected]
    <mailto:[email protected]>> 写道：

        Add back this thread to email list, forgot to reply all.

        2015年10月31日 下午7:23，"Michael Armbrust"
        <[email protected] <mailto:[email protected]>> 写道：

            Not that I know of.

            On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong
            <[email protected] <mailto:[email protected]>> wrote:

                Good to know that, will have a try.
                So there is no easy way to achieve it in pure hive
                method?

                2015年10月 31日 下午7:17，"Michael Armbrust"
                <[email protected]
                <mailto:[email protected]>> 写道：

                    Yeah, this was rewritten to be faster in Spark
                    1.5.  We use it with 10,000s of partitions.

                    On Sat, Oct 31, 2015 at 7:17 AM, Rex Xiong
                    <[email protected] <mailto:[email protected]>>
                    wrote:

                        1.3.1
                        It is a lot of improvement in 1.5+?

                        2015-10-30 19:23 GMT+08:00 Michael Armbrust
                        <[email protected]
                        <mailto:[email protected]>>:

                                We have tried schema merging feature,
                                but it's too slow, there're hundreds
                                of partitions.

                            Which version of Spark?

Re: Issue of Hive parquet partitioned table schema mismatch

Reply via email to