Hi all -

SPARK-23852 (where a query can silently give wrong results thanks to a
predicate pushdown bug in Parquet) is a fairly bad bug. In other projects
I've been involved with, we've released maintenance releases for bugs of
this severity.

Since Spark 2.4.0 is probably a while away, I wanted to see if there was
any consensus over whether we should consider (at least) a 2.3.1.

The reason this particular issue is a bit tricky is that the Parquet
community haven't yet produced a maintenance release that fixes the
underlying bug, but they are in the process of releasing a new minor
version, 1.10, which includes a fix. Having spoken to a couple of Parquet
developers, they'd be willing to consider a maintenance release, but would
probably only bother if we (or another affected project) asked them to.

My guess is that we wouldn't want to upgrade to a new minor version of
Parquet for a Spark maintenance release, so asking for a Parquet
maintenance release makes sense.

What does everyone think?

Best,
Henry

Reply via email to