Seems like this would make sense... we usually make maintenance releases
for bug fixes after a month anyway.


On Wed, Apr 11, 2018 at 12:52 PM, Henry Robinson <he...@apache.org> wrote:

>
>
> On 11 April 2018 at 12:47, Ryan Blue <rb...@netflix.com.invalid> wrote:
>
>> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of
>> Spark.
>>
>> To be clear though, this only affects Spark when reading data written by
>> Impala, right? Or does Parquet CPP also produce data like this?
>>
>
> I don't know about parquet-cpp, but yeah, the only implementation I've
> seen writing the half-completed stats is Impala. (as you know, that's
> compliant with the spec, just an unusual choice).
>
>
>>
>> On Wed, Apr 11, 2018 at 12:35 PM, Henry Robinson <he...@apache.org>
>> wrote:
>>
>>> Hi all -
>>>
>>> SPARK-23852 (where a query can silently give wrong results thanks to a
>>> predicate pushdown bug in Parquet) is a fairly bad bug. In other projects
>>> I've been involved with, we've released maintenance releases for bugs of
>>> this severity.
>>>
>>> Since Spark 2.4.0 is probably a while away, I wanted to see if there was
>>> any consensus over whether we should consider (at least) a 2.3.1.
>>>
>>> The reason this particular issue is a bit tricky is that the Parquet
>>> community haven't yet produced a maintenance release that fixes the
>>> underlying bug, but they are in the process of releasing a new minor
>>> version, 1.10, which includes a fix. Having spoken to a couple of Parquet
>>> developers, they'd be willing to consider a maintenance release, but would
>>> probably only bother if we (or another affected project) asked them to.
>>>
>>> My guess is that we wouldn't want to upgrade to a new minor version of
>>> Parquet for a Spark maintenance release, so asking for a Parquet
>>> maintenance release makes sense.
>>>
>>> What does everyone think?
>>>
>>> Best,
>>> Henry
>>>
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

Reply via email to