+1 for preview release On Fri, Sep 13, 2019 at 9:58 AM Thomas Graves <tgraves...@gmail.com> wrote:
> +1, I think having preview release would be great. > > Tom > > On Fri, Sep 13, 2019 at 4:55 AM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com> wrote: > >> +1 as a contributor and as a user. Given the amount of testing required >> for all the new cool stuff like java 11 support, major >> refactorings/deprecations etc, a preview version would help a lot the >> community making adoption smoother long term. I would also add to the list >> of issues, Scala 2.13 support ( >> https://issues.apache.org/jira/browse/SPARK-25075) assuming things will >> move forward faster the next few months. >> >> On Fri, Sep 13, 2019 at 11:08 AM Driesprong, Fokko <fo...@driesprong.frl> >> wrote: >> >>> Michael Heuer, that's an interesting issue. >>> >>> 1.8.2 to 1.9.0 is almost binary compatible (94%): >>> http://people.apache.org/~busbey/avro/1.9.0-RC4/1.8.2_to_1.9.0RC4_compat_report.html. >>> Most of the stuff is removing the Jackson and Netty API from Avro's public >>> API and deprecating the Joda library. I would strongly advise moving to >>> 1.9.1 since there are some regression issues, for Java most important: >>> https://jira.apache.org/jira/browse/AVRO-2400 >>> >>> I'd love to dive into the issue that you describe and I'm curious if the >>> issue is still there with Avro 1.9.1. I'm a bit busy at the moment but >>> might have some time this weekend to dive into it. >>> >>> Cheers, Fokko Driesprong >>> >>> >>> Op vr 13 sep. 2019 om 02:32 schreef Reynold Xin <r...@databricks.com>: >>> >>>> +1! Long due for a preview release. >>>> >>>> >>>> On Thu, Sep 12, 2019 at 5:26 PM, Holden Karau <hol...@pigscanfly.ca> >>>> wrote: >>>> >>>>> I like the idea from the PoV of giving folks something to start >>>>> testing against and exploring so they can raise issues with us earlier in >>>>> the process and we have more time to make calls around this. >>>>> >>>>> On Thu, Sep 12, 2019 at 4:15 PM John Zhuge <jzh...@apache.org> wrote: >>>>> >>>>> +1 Like the idea as a user and a DSv2 contributor. >>>>> >>>>> On Thu, Sep 12, 2019 at 4:10 PM Jungtaek Lim <kabh...@gmail.com> >>>>> wrote: >>>>> >>>>> +1 (as a contributor) from me to have preview release on Spark 3 as it >>>>> would help to test the feature. When to cut preview release is >>>>> questionable, as major works are ideally to be done before that - if we >>>>> are >>>>> intended to introduce new features before official release, that should >>>>> work regardless of this, but if we are intended to have opportunity to >>>>> test >>>>> earlier, ideally it should. >>>>> >>>>> As a one of contributors in structured streaming area, I'd like to add >>>>> some items for Spark 3.0, both "must be done" and "better to have". For >>>>> "better to have", I pick some items for new features which committers >>>>> reviewed couple of rounds and dropped off without soft-reject (No valid >>>>> reason to stop). For Spark 2.4 users, only added feature for structured >>>>> streaming is Kafka delegation token. (given we assume revising Kafka >>>>> consumer pool as improvement) I hope we provide some gifts for structured >>>>> streaming users in Spark 3.0 envelope. >>>>> >>>>> > must be done >>>>> * SPARK-26154 Stream-stream joins - left outer join gives inconsistent >>>>> output >>>>> It's a correctness issue with multiple users reported, being reported >>>>> at Nov. 2018. There's a way to reproduce it consistently, and we have a >>>>> patch submitted at Jan. 2019 to fix it. >>>>> >>>>> > better to have >>>>> * SPARK-23539 Add support for Kafka headers in Structured Streaming >>>>> * SPARK-26848 Introduce new option to Kafka source - specify timestamp >>>>> to start and end offset >>>>> * SPARK-20568 Delete files after processing in structured streaming >>>>> >>>>> There're some more new features/improvements items in SS, but given >>>>> we're talking about ramping-down, above list might be realistic one. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 12, 2019 at 9:53 AM Jean Georges Perrin <j...@jgp.net> >>>>> wrote: >>>>> >>>>> As a user/non committer, +1 >>>>> >>>>> I love the idea of an early 3.0.0 so we can test current dev against >>>>> it, I know the final 3.x will probably need another round of testing when >>>>> it gets out, but less for sure... I know I could checkout and compile, but >>>>> having a “packaged” preversion is great if it does not take too much time >>>>> to the team... >>>>> >>>>> jg >>>>> >>>>> >>>>> On Sep 11, 2019, at 20:40, Hyukjin Kwon <gurwls...@gmail.com> wrote: >>>>> >>>>> +1 from me too but I would like to know what other people think too. >>>>> >>>>> 2019년 9월 12일 (목) 오전 9:07, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 >>>>> 작성: >>>>> >>>>> Thank you, Sean. >>>>> >>>>> I'm also +1 for the following three. >>>>> >>>>> 1. Start to ramp down (by the official branch-3.0 cut) >>>>> 2. Apache Spark 3.0.0-preview in 2019 >>>>> 3. Apache Spark 3.0.0 in early 2020 >>>>> >>>>> For JDK11 clean-up, it will meet the timeline and `3.0.0-preview` >>>>> helps it a lot. >>>>> >>>>> After this discussion, can we have some timeline for `Spark 3.0 >>>>> Release Window` in our versioning-policy page? >>>>> >>>>> - https://spark.apache.org/versioning-policy.html >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>> >>>>> On Wed, Sep 11, 2019 at 11:54 AM Michael Heuer <heue...@gmail.com> >>>>> wrote: >>>>> >>>>> I would love to see Spark + Hadoop + Parquet + Avro compatibility >>>>> problems resolved, e.g. >>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-25588 >>>>> https://issues.apache.org/jira/browse/SPARK-27781 >>>>> >>>>> Note that Avro is now at 1.9.1, binary-incompatible with 1.8.x. As >>>>> far as I know, Parquet has not cut a release based on this new version. >>>>> >>>>> Then out of curiosity, are the new Spark Graph APIs targeting 3.0? >>>>> >>>>> https://github.com/apache/spark/pull/24851 >>>>> https://github.com/apache/spark/pull/24297 >>>>> >>>>> michael >>>>> >>>>> >>>>> On Sep 11, 2019, at 1:37 PM, Sean Owen <sro...@apache.org> wrote: >>>>> >>>>> I'm curious what current feelings are about ramping down towards a >>>>> Spark 3 release. It feels close to ready. There is no fixed date, >>>>> though in the past we had informally tossed around "back end of 2019". >>>>> For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect >>>>> Spark 2 to last longer, so to speak, but feels like Spark 3 is coming >>>>> due. >>>>> >>>>> What are the few major items that must get done for Spark 3, in your >>>>> opinion? Below are all of the open JIRAs for 3.0 (which everyone >>>>> should feel free to update with things that aren't really needed for >>>>> Spark 3; I already triaged some). >>>>> >>>>> For me, it's: >>>>> - DSv2? >>>>> - Finishing touches on the Hive, JDK 11 update >>>>> >>>>> What about considering a preview release earlier, as happened for >>>>> Spark 2, to get feedback much earlier than the RC cycle? Could that >>>>> even happen ... about now? >>>>> >>>>> I'm also wondering what a realistic estimate of Spark 3 release is. My >>>>> guess is quite early 2020, from here. >>>>> >>>>> >>>>> >>>>> SPARK-29014 DataSourceV2: Clean up current, default, and session >>>>> catalog uses >>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests >>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite >>>>> SPARK-28717 Update SQL ALTER TABLE RENAME to use TableCatalog API >>>>> SPARK-28588 Build a SQL reference doc >>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder >>>>> SPARK-28684 Hive module support JDK 11 >>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames >>>>> after some operations >>>>> SPARK-28372 Document Spark WEB UI >>>>> SPARK-28476 Support ALTER DATABASE SET LOCATION >>>>> SPARK-28264 Revisiting Python / pandas UDF >>>>> SPARK-28301 fix the behavior of table name resolution with >>>>> multi-catalog >>>>> SPARK-28155 do not leak SaveMode to file source v2 >>>>> SPARK-28103 Cannot infer filters from union table with empty local >>>>> relation table properly >>>>> SPARK-28024 Incorrect numeric values when out of range >>>>> SPARK-27936 Support local dependency uploading from --py-files >>>>> SPARK-27884 Deprecate Python 2 support in Spark 3.0 >>>>> SPARK-27763 Port test cases from PostgreSQL to Spark SQL >>>>> SPARK-27780 Shuffle server & client should be versioned to enable >>>>> smoother upgrade >>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # >>>>> of joined tables > 12 >>>>> SPARK-27471 Reorganize public v2 catalog API >>>>> SPARK-27520 Introduce a global config system to replace >>>>> hadoopConfiguration >>>>> SPARK-24625 put all the backward compatible behavior change configs >>>>> under spark.sql.legacy.* >>>>> SPARK-24640 size(null) returns null >>>>> SPARK-24702 Unable to cast to calendar interval in spark sql. >>>>> SPARK-24838 Support uncorrelated IN/EXISTS subqueries for more >>>>> operators >>>>> SPARK-24941 Add RDDBarrier.coalesce() function >>>>> SPARK-25017 Add test suite for ContextBarrierState >>>>> SPARK-25083 remove the type erasure hack in data source scan >>>>> SPARK-25383 Image data source supports sample pushdown >>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by >>>>> default >>>>> SPARK-27296 User Defined Aggregating Functions (UDAFs) have a major >>>>> efficiency problem >>>>> SPARK-25128 multiple simultaneous job submissions against k8s backend >>>>> cause driver pods to hang >>>>> SPARK-26731 remove EOLed spark jobs from jenkins >>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable >>>>> SPARK-21559 Remove Mesos fine-grained mode >>>>> SPARK-24942 Improve cluster resource management with jobs containing >>>>> barrier stage >>>>> SPARK-25914 Separate projection from grouping and aggregate in logical >>>>> Aggregate >>>>> SPARK-26022 PySpark Comparison with Pandas >>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL >>>>> standard >>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics >>>>> SPARK-26425 Add more constraint checks in file streaming source to >>>>> avoid checkpoint corruption >>>>> SPARK-25843 Redesign rangeBetween API >>>>> SPARK-25841 Redesign window function rangeBetween API >>>>> SPARK-25752 Add trait to easily whitelist logical operators that >>>>> produce named output from CleanupAliases >>>>> SPARK-23210 Introduce the concept of default value to schema >>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window >>>>> aggregate >>>>> SPARK-25531 new write APIs for data source v2 >>>>> SPARK-25547 Pluggable jdbc connection factory >>>>> SPARK-20845 Support specification of column names in INSERT INTO >>>>> SPARK-24417 Build and Run Spark on JDK11 >>>>> SPARK-24724 Discuss necessary info and access in barrier mode + >>>>> Kubernetes >>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos >>>>> SPARK-25074 Implement maxNumConcurrentTasks() in >>>>> MesosFineGrainedSchedulerBackend >>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 >>>>> SPARK-25186 Stabilize Data Source V2 API >>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier >>>>> execution mode >>>>> SPARK-25390 data source V2 API refactoring >>>>> SPARK-7768 Make user-defined type (UDT) API public >>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition >>>>> Spec >>>>> SPARK-15691 Refactor and improve Hive support >>>>> SPARK-15694 Implement ScriptTransformation in sql/core >>>>> SPARK-16217 Support SELECT INTO statement >>>>> SPARK-16452 basic INFORMATION_SCHEMA support >>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working >>>>> SPARK-18245 Improving support for bucketed table >>>>> SPARK-19842 Informational Referential Integrity Constraints Support in >>>>> Spark >>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested >>>>> list of structures >>>>> SPARK-22632 Fix the behavior of timestamp values for R's DataFrame to >>>>> respect session timezone >>>>> SPARK-22386 Data Source V2 improvements >>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> <dev-unsubscr...@spark.apache.org> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Name : Jungtaek Lim >>>>> Blog : http://medium.com/@heartsavior >>>>> Twitter : http://twitter.com/heartsavior >>>>> LinkedIn : http://www.linkedin.com/in/heartsavior >>>>> >>>>> >>>>> >>>>> -- >>>>> John Zhuge >>>>> >>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>> >>>> >> >>