FYI -- Thanks to a big community-wide effort over the last few days, we're now down to just one last remaining code blocker again: https://issues.apache.org/jira/browse/SPARK-23309
I'll cut an RC3 as soon as that's resolved. On 4 February 2018 at 00:02, Xingbo Jiang <jiangxb1...@gmail.com> wrote: > I filed another NPE problem in WebUI, I believe this is regression in 2.3: > https://issues.apache.org/jira/browse/SPARK-23330 > > 2018-02-01 10:38 GMT-08:00 Tom Graves <tgraves...@yahoo.com.invalid>: > >> I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not >> working - ASF JIRA <https://issues.apache.org/jira/browse/SPARK-23304> for >> the coalesce issue. >> >> [SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA >> >> <https://issues.apache.org/jira/browse/SPARK-23304> >> >> >> Tom >> >> On Thursday, February 1, 2018, 12:36:02 PM CST, Sameer Agarwal < >> samee...@apache.org> wrote: >> >> >> [+ Xiao] >> >> SPARK-23290 does sound like a blocker. On the SQL side, I can confirm >> that there were non-trivial changes around repartitioning/coalesce and >> cache performance in 2.3 -- we're currently investigating these. >> >> On 1 February 2018 at 10:02, Andrew Ash <and...@andrewash.com> wrote: >> >> I'd like to nominate SPARK-23290 >> <https://issues.apache.org/jira/browse/SPARK-23290> as a potential >> blocker for the 2.3.0 release. It's a regression from 2.2.0 in that user >> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type >> return type of date columns changed from object to datetime64[ns]. My >> understanding of the Spark Versioning Policy >> <http://spark.apache.org/versioning-policy.html> is that user code >> should continue to run in future versions of Spark with the same major >> version number. >> >> Thanks! >> >> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves <tgraves...@yahoo.com.invalid> >> wrote: >> >> >> Testing with spark 2.3 and I see a difference in the sql coalesce talking >> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce. >> >> Query: >> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= >> '20170301' AND dt <= '20170331' AND something IS NOT >> NULL").coalesce(160000).show() >> >> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't. >> Anyone know about this issue or are there some weird config changes, >> otherwise I'll file a jira? >> >> Note I also see a performance difference when reading cached data. Spark >> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse. This is only >> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3. Straight up reading >> from hive (orc) seems better though. >> >> Tom >> >> >> >> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer < >> heue...@gmail.com> wrote: >> >> >> We found two classes new to Spark 2.3.0 that must be registered in Kryo >> for our tests to pass on RC2 >> >> org.apache.spark.sql.execution .datasources.BasicWriteTaskSta ts >> org.apache.spark.sql.execution .datasources.ExecutedWriteSumm ary >> >> https://github.com/bigdatageno mics/adam/pull/1897 >> <https://github.com/bigdatagenomics/adam/pull/1897> >> >> Perhaps a mention in release notes? >> >> michael >> >> >> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath <nick.pentre...@gmail.com> >> wrote: >> >> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side >> that should be everything outstanding. >> >> >> On Thu, 1 Feb 2018 at 06:21 Yin Huai <yh...@databricks.com> wrote: >> >> seems we are not running tests related to pandas in pyspark tests (see my >> email "python tests related to pandas are skipped in jenkins"). I think we >> should fix this test issue and make sure all tests are good before cutting >> RC3. >> >> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal <samee...@apache.org> >> wrote: >> >> Just a quick status update on RC3 -- SPARK-23274 >> <https://issues.apache.org/jira/browse/SPARK-23274> was resolved >> yesterday and tests have been quite healthy throughout this week and the >> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202 >> <https://issues.apache.org/jira/browse/SPARK-23202>) is resolved. >> >> >> On 30 January 2018 at 10:12, Andrew Ash <and...@andrewash.com> wrote: >> >> I'd like to nominate SPARK-23274 >> <https://issues.apache.org/jira/browse/SPARK-23274> as a potential >> blocker for the 2.3.0 release as well, due to being a regression from >> 2.2.0. The ticket has a simple repro included, showing a query that works >> in prior releases but now fails with an exception in the catalyst optimizer. >> >> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal <sameer.a...@gmail.com> >> wrote: >> >> This vote has failed due to a number of aforementioned blockers. I'll >> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are >> resolved: https://s.apache. org/oXKi <https://s.apache.org/oXKi> >> >> >> On 25 January 2018 at 12:59, Sameer Agarwal <sameer.a...@gmail.com> >> wrote: >> >> >> Most tests pass on RC2, except I'm still seeing the timeout caused by >> https://issues.apache.org/ >> jira/browse/SPARK-23055 >> <https://issues.apache.org/jira/browse/SPARK-23055> ; the tests never >> finish. I followed the thread a bit further and wasn't clear whether it was >> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with >> https://issues.apache. >> org/jira/browse/SPARK-22908 >> <https://issues.apache.org/jira/browse/SPARK-22908> for 2.3.0 though I >> am still seeing these tests fail or hang: >> >> - subscribing topic by name from earliest offsets (failOnDataLoss: false) >> - subscribing topic by name from earliest offsets (failOnDataLoss: true) >> >> >> Sean, while some of these tests were timing out on RC1, we're not aware >> of any known issues in RC2. Both maven (https://amplab.cs.berkeley. >> edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/ >> spark-branch-2.3-test-maven- hadoop-2.6/146/testReport/org. >> apache.spark.sql.kafka010/ history/ >> <https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/>) >> and sbt (https://amplab.cs.berkeley. edu/jenkins/view/Spark%20QA% >> 20Test%20(Dashboard)/job/ spark-branch-2.3-test-sbt- >> hadoop-2.6/123/testReport/org. apache.spark.sql.kafka010/ history/ >> <https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/>) >> historical builds on jenkins for org.apache.spark.sql. kafka010 look fairly >> healthy. If you're still seeing timeouts in RC2, can you create a JIRA with >> any applicable build/env info? >> >> >> >> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen <so...@cloudera.com> wrote: >> >> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried >> unpacking it with 'xvzf' and also unzipping it first, and it untarred >> without warnings in either case. >> >> I am encountering errors while running the tests, different ones each >> time, so am still figuring out whether there is a real problem or just >> flaky tests. >> >> These issues look like blockers, as they are inherently to be completed >> before the 2.3 release. They are mostly not done. I suppose I'd -1 on >> behalf of those who say this needs to be done first, though, we can keep >> testing. >> >> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella >> SPARK-23114 Spark R 2.3 QA umbrella >> >> Here are the remaining items targeted for 2.3: >> >> SPARK-15689 Data source API v2 >> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming >> SPARK-21646 Add new type coercion rules to compatible with Hive >> SPARK-22386 Data Source V2 improvements >> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite >> SPARK-22735 Add VectorSizeHint to ML features documentation >> SPARK-22739 Additional Expression Support for Objects >> SPARK-22809 pyspark is sensitive to imports with dots >> SPARK-22820 Spark 2.3 SQL API audit >> >> >> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >> +0 >> >> Signatures check out. Code compiles, although I see the errors in [1] >> when untarring the source archive; perhaps we should add "use GNU tar" >> to the RM checklist? >> >> Also ran our internal tests and they seem happy. >> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the >> documentation ones). It is not long, but it seems some of those need >> to be looked at. It would be nice for the committers who are involved >> in those bugs to take a look. >> >> [1] https://superuser.com/ questions/318809/linux-os-x- >> tar-incompatibility-tarballs- created-on-os-x-give-errors- when-unt >> <https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt> >> >> >> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal <samee...@apache.org> >> wrote: >> > Please vote on releasing the following candidate as Apache Spark version >> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC >> and >> > passes if a majority of at least 3 PMC +1 votes are cast. >> > >> > >> > [ ] +1 Release this package as Apache Spark 2.3.0 >> > >> > [ ] -1 Do not release this package because ... >> > >> > >> > To learn more about Apache Spark, please see https://spark.apache.org/ >> > >> > The tag to be voted on is v2.3.0-rc2: >> > https://github.com/apache/ spark/tree/v2.3.0-rc2 >> <https://github.com/apache/spark/tree/v2.3.0-rc2> >> > ( 489ecb0ef23e5d9b705e5e5bae4fa3 d871bdac91) >> > >> > List of JIRA tickets resolved in this release can be found here: >> > https://issues.apache.org/ jira/projects/SPARK/versions/ 12339551 >> <https://issues.apache.org/jira/projects/SPARK/versions/12339551> >> > >> > The release files, including signatures, digests, etc. can be found at: >> > https://dist.apache.org/repos/ dist/dev/spark/v2.3.0-rc2-bin/ >> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/> >> > >> > Release artifacts are signed with the following key: >> > https://dist.apache.org/repos/ dist/dev/spark/KEYS >> <https://dist.apache.org/repos/dist/dev/spark/KEYS> >> > >> > The staging repository for this release can be found at: >> > https://repository.apache.org/ content/repositories/ >> orgapachespark-1262/ >> <https://repository.apache.org/content/repositories/orgapachespark-1262/> >> > >> > The documentation corresponding to this release can be found at: >> > https://dist.apache.org/repos/ dist/dev/spark/v2.3.0-rc2- >> docs/_site/index.html >> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html> >> > >> > >> > FAQ >> > >> > ============================== ========= >> > What are the unresolved issues targeted for 2.3.0? >> > ============================== ========= >> > >> > Please see https://s.apache.org/oXKi. At the time of writing, there are >> > currently no known release blockers. >> > >> > ========================= >> > How can I help test this release? >> > ========================= >> > >> > If you are a Spark user, you can help us test this release by taking an >> > existing Spark workload and running on this release candidate, then >> > reporting any regressions. >> > >> > If you're working in PySpark you can set up a virtual env and install >> the >> > current RC and see if anything important breaks, in the Java/Scala you >> can >> > add the staging repository to your projects resolvers and test with the >> RC >> > (make sure to clean up the artifact cache before/after so you don't end >> up >> > building with a out of date RC going forward). >> > >> > ============================== ============= >> > What should happen to JIRA tickets still targeting 2.3.0? >> > ============================== ============= >> > >> > Committers should look at those and triage. Extremely important bug >> fixes, >> > documentation, and API tweaks that impact compatibility should be >> worked on >> > immediately. Everything else please retarget to 2.3.1 or 2.3.0 as >> > appropriate. >> > >> > =================== >> > Why is my bug not fixed? >> > =================== >> > >> > In order to make timely releases, we will typically not hold the release >> > unless the bug in question is a regression from 2.2.0. That being said, >> if >> > there is something which is a regression from 2.2.0 and has not been >> > correctly targeted please ping me or a committer to help target the >> issue >> > (you can see the open issues listed as impacting Spark 2.3.0 at >> > https://s.apache.org/WmoI). >> > >> > >> > Regards, >> > Sameer >> >> >> >> -- >> Marcelo >> >> ------------------------------ ------------------------------ --------- >> To unsubscribe e-mail: dev-unsubscribe@spark.apache. org >> <dev-unsubscr...@spark.apache.org> >> >> >> >> >> >> -- >> Sameer Agarwal >> Computer Science | UC Berkeley >> http://cs.berkeley.edu/~ sameerag <http://cs.berkeley.edu/~sameerag> >> >> >> >> >> >> >> >> > -- Sameer Agarwal Computer Science | UC Berkeley http://cs.berkeley.edu/~sameerag