this is column names containing dots that do not target fields inside
structs? so not a.b as in field b inside struct a, but somehow a field
called a.b? i didnt even know it is supported at all. its something i would
never try because it sounds like a bad idea to go there...

On Fri, Apr 28, 2017 at 12:17 PM, Andrew Ash <and...@andrewash.com> wrote:

> -1 due to regression from 2.1.1
>
> In 2.2.0-rc1 we bumped the Parquet version from 1.8.1 to 1.8.2 in commit
> 26a4cba3ff <https://github.com/apache/spark/commit/26a4cba3ff>.  Parquet
> 1.8.2 includes a backport from 1.9.0: PARQUET-389
> <https://issues.apache.org/jira/browse/PARQUET-389> in commit 2282c22c
> <https://github.com/apache/parquet-mr/commit/2282c22c>
>
> This backport caused a regression in Spark, where filtering on columns
> containing dots in the column name pushes the filter down into Parquet
> where Parquet incorrectly handles the predicate.  Spark pushes the String
> "col.dots" as the column name, but Parquet interprets this as
> "struct.field" where the predicate is on a field of a struct.  The ultimate
> result is that the predicate always returns zero results, causing a data
> correctness issue.
>
> This issue is filed in Spark as SPARK-20364
> <https://issues.apache.org/jira/browse/SPARK-20364> and has a PR fix up
> at PR #17680 <https://github.com/apache/spark/pull/17680>.
>
> I nominate SPARK-20364 <https://issues.apache.org/jira/browse/SPARK-20364> as
> a release blocker due to the data correctness regression.
>
> Thanks!
> Andrew
>
> On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> By the way the RC looks good. Sigs and license are OK, tests pass with
>> -Phive -Pyarn -Phadoop-2.7. +1 from me.
>>
>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com>
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST
>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc1
>>> <https://github.com/apache/spark/tree/v2.2.0-rc1> (8ccb4a57c82146c
>>> 1a8f8966c7e64010cf5632cb6)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1235/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.1.
>>>
>>
>

Reply via email to