Seems like this would make sense... we usually make maintenance releases for bug fixes after a month anyway.
On Wed, Apr 11, 2018 at 12:52 PM, Henry Robinson <he...@apache.org> wrote: > > > On 11 April 2018 at 12:47, Ryan Blue <rb...@netflix.com.invalid> wrote: > >> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of >> Spark. >> >> To be clear though, this only affects Spark when reading data written by >> Impala, right? Or does Parquet CPP also produce data like this? >> > > I don't know about parquet-cpp, but yeah, the only implementation I've > seen writing the half-completed stats is Impala. (as you know, that's > compliant with the spec, just an unusual choice). > > >> >> On Wed, Apr 11, 2018 at 12:35 PM, Henry Robinson <he...@apache.org> >> wrote: >> >>> Hi all - >>> >>> SPARK-23852 (where a query can silently give wrong results thanks to a >>> predicate pushdown bug in Parquet) is a fairly bad bug. In other projects >>> I've been involved with, we've released maintenance releases for bugs of >>> this severity. >>> >>> Since Spark 2.4.0 is probably a while away, I wanted to see if there was >>> any consensus over whether we should consider (at least) a 2.3.1. >>> >>> The reason this particular issue is a bit tricky is that the Parquet >>> community haven't yet produced a maintenance release that fixes the >>> underlying bug, but they are in the process of releasing a new minor >>> version, 1.10, which includes a fix. Having spoken to a couple of Parquet >>> developers, they'd be willing to consider a maintenance release, but would >>> probably only bother if we (or another affected project) asked them to. >>> >>> My guess is that we wouldn't want to upgrade to a new minor version of >>> Parquet for a Spark maintenance release, so asking for a Parquet >>> maintenance release makes sense. >>> >>> What does everyone think? >>> >>> Best, >>> Henry >>> >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > >