+1

On Tue, Oct 30, 2018 at 4:42 AM Wenchen Fan <cloud0...@gmail.com> wrote:

> Thanks for reporting the bug! I'll list it as a known issue for 2.4.0
>
> I'm adding my own +1, since all the known blockers are resolved.
>
> On Tue, Oct 30, 2018 at 2:56 PM Xiao Li <lix...@databricks.com> wrote:
>
>> Yes, this is not a blocker.
>> "spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by
>> default. As DB Tsai said, column pruning of nested schema for Parquet
>> tables is experimental. In this release, we encourage the whole community
>> to try this new feature but it might have bugs like what the JIRA
>> SPARK-25879 reports.
>>
>> We still can fix the issues in the minor release of Spark 2.4, as long as
>> the risk is not high.
>>
>> Thanks,
>>
>> Xiao
>>
>>
>> On Mon, Oct 29, 2018 at 11:49 PM DB Tsai <dbt...@dbtsai.com.invalid>
>> wrote:
>>
>>> +0
>>>
>>> I understand that schema pruning is an experimental feature in Spark
>>> 2.4, and this can help a lot in read performance as people are trying
>>> to keep the hierarchical data in nested format.
>>>
>>> We just found a serious bug---it could fail parquet reader if a nested
>>> field and top level field are selected simultaneously.
>>> https://issues.apache.org/jira/browse/SPARK-25879
>>>
>>> If we decide to not fix it in 2.4, we should at least document it in
>>> the release note to let users know.
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> ----------------------------------------------------------
>>> Web: https://www.dbtsai.com
>>> PGP Key ID: 0x5CED8B896A6BDFA0
>>> On Mon, Oct 29, 2018 at 8:42 PM Hyukjin Kwon <gurwls...@gmail.com>
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > 2018년 10월 30일 (화) 오전 11:03, Gengliang Wang <ltn...@gmail.com>님이 작성:
>>> >>
>>> >> +1
>>> >>
>>> >> > 在 2018年10月30日,上午10:41,Sean Owen <sro...@gmail.com> 写道:
>>> >> >
>>> >> > +1
>>> >> >
>>> >> > Same result as in RC4 from me, and the issues I know of that were
>>> >> > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11.
>>> >> >
>>> >> > These items are still targeted to 2.4.0; Xiangrui I assume these
>>> >> > should just be untargeted now, or resolved?
>>> >> > SPARK-25584 Document libsvm data source in doc site
>>> >> > SPARK-25346 Document Spark builtin data sources
>>> >> > SPARK-24464 Unit tests for MLlib's Instrumentation
>>> >> > On Mon, Oct 29, 2018 at 5:22 AM Wenchen Fan <cloud0...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> Please vote on releasing the following candidate as Apache Spark
>>> version 2.4.0.
>>> >> >>
>>> >> >> The vote is open until November 1 PST and passes if a majority +1
>>> PMC votes are cast, with
>>> >> >> a minimum of 3 +1 votes.
>>> >> >>
>>> >> >> [ ] +1 Release this package as Apache Spark 2.4.0
>>> >> >> [ ] -1 Do not release this package because ...
>>> >> >>
>>> >> >> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>> >> >>
>>> >> >> The tag to be voted on is v2.4.0-rc5 (commit
>>> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
>>> >> >> https://github.com/apache/spark/tree/v2.4.0-rc5
>>> >> >>
>>> >> >> The release files, including signatures, digests, etc. can be
>>> found at:
>>> >> >> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>>> >> >>
>>> >> >> Signatures used for Spark RCs can be found in this file:
>>> >> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >> >>
>>> >> >> The staging repository for this release can be found at:
>>> >> >>
>>> https://repository.apache.org/content/repositories/orgapachespark-1291
>>> >> >>
>>> >> >> The documentation corresponding to this release can be found at:
>>> >> >> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>>> >> >>
>>> >> >> The list of bug fixes going into 2.4.0 can be found at the
>>> following URL:
>>> >> >> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>> >> >>
>>> >> >> FAQ
>>> >> >>
>>> >> >> =========================
>>> >> >> How can I help test this release?
>>> >> >> =========================
>>> >> >>
>>> >> >> If you are a Spark user, you can help us test this release by
>>> taking
>>> >> >> an existing Spark workload and running on this release candidate,
>>> then
>>> >> >> reporting any regressions.
>>> >> >>
>>> >> >> If you're working in PySpark you can set up a virtual env and
>>> install
>>> >> >> the current RC and see if anything important breaks, in the
>>> Java/Scala
>>> >> >> you can add the staging repository to your projects resolvers and
>>> test
>>> >> >> with the RC (make sure to clean up the artifact cache before/after
>>> so
>>> >> >> you don't end up building with a out of date RC going forward).
>>> >> >>
>>> >> >> ===========================================
>>> >> >> What should happen to JIRA tickets still targeting 2.4.0?
>>> >> >> ===========================================
>>> >> >>
>>> >> >> The current list of open tickets targeted at 2.4.0 can be found at:
>>> >> >> https://issues.apache.org/jira/projects/SPARK and search for
>>> "Target Version/s" = 2.4.0
>>> >> >>
>>> >> >> Committers should look at those and triage. Extremely important bug
>>> >> >> fixes, documentation, and API tweaks that impact compatibility
>>> should
>>> >> >> be worked on immediately. Everything else please retarget to an
>>> >> >> appropriate release.
>>> >> >>
>>> >> >> ==================
>>> >> >> But my bug isn't fixed?
>>> >> >> ==================
>>> >> >>
>>> >> >> In order to make timely releases, we will typically not hold the
>>> >> >> release unless the bug in question is a regression from the
>>> previous
>>> >> >> release. That being said, if there is something which is a
>>> regression
>>> >> >> that has not been correctly targeted please ping me or a committer
>>> to
>>> >> >> help target the issue.
>>> >> >
>>> >> >
>>> ---------------------------------------------------------------------
>>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >> >
>>> >>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> [image: Spark+AI Summit North America 2019]
>> <http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to