Re: [VOTE] Spark 2.3.0 (RC2)

Sameer Agarwal Thu, 01 Feb 2018 10:36:32 -0800

[+ Xiao]

SPARK-23290  does sound like a blocker. On the SQL side, I can confirm that
there were non-trivial changes around repartitioning/coalesce and cache
performance in 2.3 --  we're currently investigating these.


On 1 February 2018 at 10:02, Andrew Ash <[email protected]> wrote:

> I'd like to nominate SPARK-23290
> <https://issues.apache.org/jira/browse/SPARK-23290> as a potential
> blocker for the 2.3.0 release.  It's a regression from 2.2.0 in that user
> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type
> return type of date columns changed from object to datetime64[ns].  My
> understanding of the Spark Versioning Policy
> <http://spark.apache.org/versioning-policy.html> is that user code should
> continue to run in future versions of Spark with the same major version
> number.
>
> Thanks!
>
> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves <[email protected]>
> wrote:
>
>>
>> Testing with spark 2.3 and I see a difference in the sql coalesce talking
>> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>>
>> Query:
>> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
>> '20170301' AND dt <= '20170331' AND something IS NOT
>> NULL").coalesce(160000).show()
>>
>> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>>  Anyone know about this issue or are there some weird config changes,
>> otherwise I'll file a jira?
>>
>> Note I also see a performance difference when reading cached data. Spark
>> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
>> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
>> from hive (orc) seems better though.
>>
>> Tom
>>
>>
>>
>> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
>> [email protected]> wrote:
>>
>>
>> We found two classes new to Spark 2.3.0 that must be registered in Kryo
>> for our tests to pass on RC2
>>
>> org.apache.spark.sql.execution.datasources.BasicWriteTaskStats
>> org.apache.spark.sql.execution.datasources.ExecutedWriteSummary
>>
>> https://github.com/bigdatagenomics/adam/pull/1897
>>
>> Perhaps a mention in release notes?
>>
>>    michael
>>
>>
>> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath <[email protected]>
>> wrote:
>>
>> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
>> that should be everything outstanding.
>>
>>
>> On Thu, 1 Feb 2018 at 06:21 Yin Huai <[email protected]> wrote:
>>
>> seems we are not running tests related to pandas in pyspark tests (see my
>> email "python tests related to pandas are skipped in jenkins"). I think we
>> should fix this test issue and make sure all tests are good before cutting
>> RC3.
>>
>> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal <[email protected]>
>> wrote:
>>
>> Just a quick status update on RC3 -- SPARK-23274
>> <https://issues.apache.org/jira/browse/SPARK-23274> was resolved
>> yesterday and tests have been quite healthy throughout this week and the
>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>> <https://issues.apache.org/jira/browse/SPARK-23202>) is resolved.
>>
>>
>> On 30 January 2018 at 10:12, Andrew Ash <[email protected]> wrote:
>>
>> I'd like to nominate SPARK-23274
>> <https://issues.apache.org/jira/browse/SPARK-23274> as a potential
>> blocker for the 2.3.0 release as well, due to being a regression from
>> 2.2.0.  The ticket has a simple repro included, showing a query that works
>> in prior releases but now fails with an exception in the catalyst optimizer.
>>
>> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal <[email protected]>
>> wrote:
>>
>> This vote has failed due to a number of aforementioned blockers. I'll
>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>> resolved: https://s.apache. org/oXKi <https://s.apache.org/oXKi>
>>
>>
>> On 25 January 2018 at 12:59, Sameer Agarwal <[email protected]>
>> wrote:
>>
>>
>> Most tests pass on RC2, except I'm still seeing the timeout caused by 
>> https://issues.apache.org/
>> jira/browse/SPARK-23055
>> <https://issues.apache.org/jira/browse/SPARK-23055> ; the tests never
>> finish. I followed the thread a bit further and wasn't clear whether it was
>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with 
>> https://issues.apache.
>> org/jira/browse/SPARK-22908
>> <https://issues.apache.org/jira/browse/SPARK-22908>  for 2.3.0 though I
>> am still seeing these tests fail or hang:
>>
>> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
>> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>
>>
>> Sean, while some of these tests were timing out on RC1, we're not aware
>> of any known issues in RC2. Both maven (https://amplab.cs.berkeley.
>> edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/
>> spark-branch-2.3-test-maven- hadoop-2.6/146/testReport/org.
>> apache.spark.sql.kafka010/ history/
>> <https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/>)
>> and sbt (https://amplab.cs.berkeley. edu/jenkins/view/Spark%20QA%
>> 20Test%20(Dashboard)/job/ spark-branch-2.3-test-sbt-
>> hadoop-2.6/123/testReport/org. apache.spark.sql.kafka010/ history/
>> <https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/>)
>> historical builds on jenkins for org.apache.spark.sql. kafka010 look fairly
>> healthy. If you're still seeing timeouts in RC2, can you create a JIRA with
>> any applicable build/env info?
>>
>>
>>
>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen <[email protected]> wrote:
>>
>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>> without warnings in either case.
>>
>> I am encountering errors while running the tests, different ones each
>> time, so am still figuring out whether there is a real problem or just
>> flaky tests.
>>
>> These issues look like blockers, as they are inherently to be completed
>> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
>> behalf of those who say this needs to be done first, though, we can keep
>> testing.
>>
>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>> SPARK-23114 Spark R 2.3 QA umbrella
>>
>> Here are the remaining items targeted for 2.3:
>>
>> SPARK-15689 Data source API v2
>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>> SPARK-21646 Add new type coercion rules to compatible with Hive
>> SPARK-22386 Data Source V2 improvements
>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>> SPARK-22735 Add VectorSizeHint to ML features documentation
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22820 Spark 2.3 SQL API audit
>>
>>
>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin <[email protected]>
>> wrote:
>>
>> +0
>>
>> Signatures check out. Code compiles, although I see the errors in [1]
>> when untarring the source archive; perhaps we should add "use GNU tar"
>> to the RM checklist?
>>
>> Also ran our internal tests and they seem happy.
>>
>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> documentation ones). It is not long, but it seems some of those need
>> to be looked at. It would be nice for the committers who are involved
>> in those bugs to take a look.
>>
>> [1] https://superuser.com/ questions/318809/linux-os-x-
>> tar-incompatibility-tarballs- created-on-os-x-give-errors- when-unt
>> <https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt>
>>
>>
>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal <[email protected]>
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
>> and
>> > passes if a majority of at least 3 PMC +1 votes are cast.
>> >
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.0
>> >
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see https://spark.apache.org/
>> >
>> > The tag to be voted on is v2.3.0-rc2:
>> > https://github.com/apache/ spark/tree/v2.3.0-rc2
>> <https://github.com/apache/spark/tree/v2.3.0-rc2>
>> > ( 489ecb0ef23e5d9b705e5e5bae4fa3 d871bdac91)
>> >
>> > List of JIRA tickets resolved in this release can be found here:
>> > https://issues.apache.org/ jira/projects/SPARK/versions/ 12339551
>> <https://issues.apache.org/jira/projects/SPARK/versions/12339551>
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/ dist/dev/spark/v2.3.0-rc2-bin/
>> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/>
>> >
>> > Release artifacts are signed with the following key:
>> > https://dist.apache.org/repos/ dist/dev/spark/KEYS
>> <https://dist.apache.org/repos/dist/dev/spark/KEYS>
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/ content/repositories/
>> orgapachespark-1262/
>> <https://repository.apache.org/content/repositories/orgapachespark-1262/>
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/ dist/dev/spark/v2.3.0-rc2-
>> docs/_site/index.html
>> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html>
>> >
>> >
>> > FAQ
>> >
>> > ============================== =========
>> > What are the unresolved issues targeted for 2.3.0?
>> > ============================== =========
>> >
>> > Please see https://s.apache.org/oXKi. At the time of writing, there are
>> > currently no known release blockers.
>> >
>> > =========================
>> > How can I help test this release?
>> > =========================
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> the
>> > current RC and see if anything important breaks, in the Java/Scala you
>> can
>> > add the staging repository to your projects resolvers and test with the
>> RC
>> > (make sure to clean up the artifact cache before/after so you don't end
>> up
>> > building with a out of date RC going forward).
>> >
>> > ============================== =============
>> > What should happen to JIRA tickets still targeting 2.3.0?
>> > ============================== =============
>> >
>> > Committers should look at those and triage. Extremely important bug
>> fixes,
>> > documentation, and API tweaks that impact compatibility should be
>> worked on
>> > immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
>> > appropriate.
>> >
>> > ===================
>> > Why is my bug not fixed?
>> > ===================
>> >
>> > In order to make timely releases, we will typically not hold the release
>> > unless the bug in question is a regression from 2.2.0. That being said,
>> if
>> > there is something which is a regression from 2.2.0 and has not been
>> > correctly targeted please ping me or a committer to help target the
>> issue
>> > (you can see the open issues listed as impacting Spark 2.3.0 at
>> > https://s.apache.org/WmoI).
>> >
>> >
>> > Regards,
>> > Sameer
>>
>>
>>
>> --
>> Marcelo
>>
>> ------------------------------ ------------------------------ ---------
>> To unsubscribe e-mail: [email protected]. org
>> <[email protected]>
>>
>>
>>
>>
>>
>> --
>> Sameer Agarwal
>> Computer Science | UC Berkeley
>> http://cs.berkeley.edu/~ sameerag <http://cs.berkeley.edu/~sameerag>
>>
>>
>>
>>
>>
>>
>

Re: [VOTE] Spark 2.3.0 (RC2)

Reply via email to