Re: [VOTE] Apache Spark 3.0.0 RC1

Jungtaek Lim Thu, 09 Apr 2020 22:40:45 -0700

Thanks for sharing the blockers, Wenchen. SPARK-31404 has sub-tasks, hence
that means all sub-tasks are blockers for this release, do I understand
that correctly?


Xiao, I sincerely respect the practice the Spark community has been done,
so please treat it as 2 cents. Just would like to see the way how the
community could focus on the such huge release - even only counting bugs +
improvement + new features, nearly 2000 issues has been resolved "only" in
Spark 3.0.0. The volume seems to be quite different from usual bugfix and
minor releases which feels that special cares are needed.


On Fri, Apr 10, 2020 at 1:22 PM Wenchen Fan <[email protected]> wrote:

> The ongoing critical issues I'm aware of are:
> SPARK-31257 <https://issues.apache.org/jira/browse/SPARK-31257>: Fix
> ambiguous two different CREATE TABLE syntaxes
> SPARK-31404 <https://issues.apache.org/jira/browse/SPARK-31404>: backward
> compatibility issues after switching to Proleptic Gregorian calendar
> SPARK-31399 <https://issues.apache.org/jira/browse/SPARK-31399>: closure
> cleaner is broken in Spark 3.0
> SPARK-28067 <https://issues.apache.org/jira/browse/SPARK-28067>:
> Incorrect results in decimal aggregation with whole-stage codegen enabled
>
> That said, I'm -1 (binding) to RC1
>
> Please reply to this thread if you know more critical issues that should
> be fixed before 3.0.
>
> Thanks,
> Wenchen
>
>
> On Fri, Apr 10, 2020 at 10:01 AM Xiao Li <[email protected]> wrote:
>
>> Only the low-risk or high-value bug fixes, and the documentation changes
>> are allowed to merge to branch-3.0. I expect all the committers are
>> following the same rules like what we did in the previous releases.
>>
>> Xiao
>>
>> On Thu, Apr 9, 2020 at 6:13 PM Jungtaek Lim <[email protected]>
>> wrote:
>>
>>> Looks like around 80 commits have been landed to branch-3.0 after we cut
>>> RC1 (I know many of them are to version the config, as well as add docs).
>>> Shall we announce the blocker-only phase and maintain the list of blockers
>>> to restrict the changes on the branch? This would make everyone being
>>> hesitate to test the RC1 (see how many people have been tested RC1 in this
>>> thread), as they probably need to test the same with RC2.
>>>
>>> On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim <
>>> [email protected]> wrote:
>>>
>>>> I went through some manually tests for the new features of Structured
>>>> Streaming in Spark 3.0.0. (Please let me know if there're more features
>>>> we'd like to test manually.)
>>>>
>>>> * file source cleanup - both “archive" and “delete" work. Query fails
>>>> as expected when the input directory is the output directory of file sink.
>>>> * kafka source/sink - “header” works for both source and sink, "group
>>>> id prefix" and “static group id” work, confirmed start offset by timestamp
>>>> works for streaming case
>>>> * event log stuffs with streaming query - enabled it, confirmed
>>>> compaction works, and SHS can read compacted event logs, and downloading
>>>> event log in SHS works as zipping the event log directory. original
>>>> functionalities with single event log file work as well.
>>>>
>>>> Looks good, though there're still plenty of commits pushed to
>>>> branch-3.0 after RC1 which feels me that it may not be safe to carry over
>>>> the test result for RC1 to RC2.
>>>>
>>>> On Sat, Apr 4, 2020 at 12:49 AM Sean Owen <[email protected]> wrote:
>>>>
>>>>> Aside from the other issues mentioned here, which probably do require
>>>>> another RC, this looks pretty good to me.
>>>>>
>>>>> I built on Ubuntu 19 and ran with Java 11, -Pspark-ganglia-lgpl
>>>>> -Pkinesis-asl -Phadoop-3.2 -Phive-2.3 -Pyarn -Pmesos -Pkubernetes
>>>>> -Phive-thriftserver -Djava.version=11
>>>>>
>>>>> I did see the following test failures, but as usual, I'm not sure
>>>>> whether it's specific to me. Anyone else see these, particularly the R
>>>>> warnings?
>>>>>
>>>>>
>>>>> PythonUDFSuite:
>>>>> org.apache.spark.sql.execution.python.PythonUDFSuite *** ABORTED ***
>>>>>   java.lang.RuntimeException: Unable to load a Suite class that was
>>>>> discovered in the runpath:
>>>>> org.apache.spark.sql.execution.python.PythonUDFSuite
>>>>>   at
>>>>> org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
>>>>>   at
>>>>> org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
>>>>>   at
>>>>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>>>>>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>>>>>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>>>>>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>>>>>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>>>>>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>>>>>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>>>>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>>>>>
>>>>>
>>>>> - SPARK-25158: Executor accidentally exit because
>>>>> ScriptTransformationWriterThread throw Exception *** FAILED ***
>>>>>   Expected exception org.apache.spark.SparkException to be thrown, but
>>>>> no exception was thrown (SQLQuerySuite.scala:2384)
>>>>>
>>>>>
>>>>> * checking for missing documentation entries ... WARNING
>>>>> Undocumented code objects:
>>>>>   ‘%<=>%’ ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’
>>>>>   ‘approx_count_distinct’ ‘arrange’ ‘array_contains’ ‘array_distinct’
>>>>> ...
>>>>>  WARNING
>>>>> ‘qpdf’ is needed for checks on size reduction of PDFs
>>>>>
>>>>> On Tue, Mar 31, 2020 at 10:04 PM Reynold Xin <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Please vote on releasing the following candidate as Apache Spark
>>>>> version 3.0.0.
>>>>> >
>>>>> > The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if
>>>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>> >
>>>>> > [ ] +1 Release this package as Apache Spark 3.0.0
>>>>> > [ ] -1 Do not release this package because ...
>>>>> >
>>>>> > To learn more about Apache Spark, please see
>>>>> http://spark.apache.org/
>>>>> >
>>>>> > The tag to be voted on is v3.0.0-rc1 (commit
>>>>> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
>>>>> > https://github.com/apache/spark/tree/v3.0.0-rc1
>>>>> >
>>>>> > The release files, including signatures, digests, etc. can be found
>>>>> at:
>>>>> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>>>>> >
>>>>> > Signatures used for Spark RCs can be found in this file:
>>>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>> >
>>>>> > The staging repository for this release can be found at:
>>>>> >
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1341/
>>>>> >
>>>>> > The documentation corresponding to this release can be found at:
>>>>> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>>>>> >
>>>>> > The list of bug fixes going into 2.4.5 can be found at the following
>>>>> URL:
>>>>> > https://issues.apache.org/jira/projects/SPARK/versions/12339177
>>>>> >
>>>>> > This release is using the release script of the tag v3.0.0-rc1.
>>>>> >
>>>>> >
>>>>> > FAQ
>>>>> >
>>>>> > =========================
>>>>> > How can I help test this release?
>>>>> > =========================
>>>>> > If you are a Spark user, you can help us test this release by taking
>>>>> > an existing Spark workload and running on this release candidate,
>>>>> then
>>>>> > reporting any regressions.
>>>>> >
>>>>> > If you're working in PySpark you can set up a virtual env and install
>>>>> > the current RC and see if anything important breaks, in the
>>>>> Java/Scala
>>>>> > you can add the staging repository to your projects resolvers and
>>>>> test
>>>>> > with the RC (make sure to clean up the artifact cache before/after so
>>>>> > you don't end up building with a out of date RC going forward).
>>>>> >
>>>>> > ===========================================
>>>>> > What should happen to JIRA tickets still targeting 3.0.0?
>>>>> > ===========================================
>>>>> > The current list of open tickets targeted at 3.0.0 can be found at:
>>>>> > https://issues.apache.org/jira/projects/SPARK and search for
>>>>> "Target Version/s" = 3.0.0
>>>>> >
>>>>> > Committers should look at those and triage. Extremely important bug
>>>>> > fixes, documentation, and API tweaks that impact compatibility should
>>>>> > be worked on immediately. Everything else please retarget to an
>>>>> > appropriate release.
>>>>> >
>>>>> > ==================
>>>>> > But my bug isn't fixed?
>>>>> > ==================
>>>>> > In order to make timely releases, we will typically not hold the
>>>>> > release unless the bug in question is a regression from the previous
>>>>> > release. That being said, if there is something which is a regression
>>>>> > that has not been correctly targeted please ping me or a committer to
>>>>> > help target the issue.
>>>>> >
>>>>> >
>>>>> > Note: I fully expect this RC to fail.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: [email protected]
>>>>>
>>>>>
>>
>> --
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

Re: [VOTE] Apache Spark 3.0.0 RC1

Reply via email to