+1 from me too but I would like to know what other people think too. 2019년 9월 12일 (목) 오전 9:07, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성:
> Thank you, Sean. > > I'm also +1 for the following three. > > 1. Start to ramp down (by the official branch-3.0 cut) > 2. Apache Spark 3.0.0-preview in 2019 > 3. Apache Spark 3.0.0 in early 2020 > > For JDK11 clean-up, it will meet the timeline and `3.0.0-preview` helps it > a lot. > > After this discussion, can we have some timeline for `Spark 3.0 Release > Window` in our versioning-policy page? > > - https://spark.apache.org/versioning-policy.html > > Bests, > Dongjoon. > > > On Wed, Sep 11, 2019 at 11:54 AM Michael Heuer <heue...@gmail.com> wrote: > >> I would love to see Spark + Hadoop + Parquet + Avro compatibility >> problems resolved, e.g. >> >> https://issues.apache.org/jira/browse/SPARK-25588 >> https://issues.apache.org/jira/browse/SPARK-27781 >> >> Note that Avro is now at 1.9.1, binary-incompatible with 1.8.x. As far >> as I know, Parquet has not cut a release based on this new version. >> >> Then out of curiosity, are the new Spark Graph APIs targeting 3.0? >> >> https://github.com/apache/spark/pull/24851 >> https://github.com/apache/spark/pull/24297 >> >> michael >> >> >> On Sep 11, 2019, at 1:37 PM, Sean Owen <sro...@apache.org> wrote: >> >> I'm curious what current feelings are about ramping down towards a >> Spark 3 release. It feels close to ready. There is no fixed date, >> though in the past we had informally tossed around "back end of 2019". >> For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect >> Spark 2 to last longer, so to speak, but feels like Spark 3 is coming >> due. >> >> What are the few major items that must get done for Spark 3, in your >> opinion? Below are all of the open JIRAs for 3.0 (which everyone >> should feel free to update with things that aren't really needed for >> Spark 3; I already triaged some). >> >> For me, it's: >> - DSv2? >> - Finishing touches on the Hive, JDK 11 update >> >> What about considering a preview release earlier, as happened for >> Spark 2, to get feedback much earlier than the RC cycle? Could that >> even happen ... about now? >> >> I'm also wondering what a realistic estimate of Spark 3 release is. My >> guess is quite early 2020, from here. >> >> >> >> SPARK-29014 DataSourceV2: Clean up current, default, and session catalog >> uses >> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests >> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite >> SPARK-28717 Update SQL ALTER TABLE RENAME to use TableCatalog API >> SPARK-28588 Build a SQL reference doc >> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder >> SPARK-28684 Hive module support JDK 11 >> SPARK-28548 explain() shows wrong result for persisted DataFrames >> after some operations >> SPARK-28372 Document Spark WEB UI >> SPARK-28476 Support ALTER DATABASE SET LOCATION >> SPARK-28264 Revisiting Python / pandas UDF >> SPARK-28301 fix the behavior of table name resolution with multi-catalog >> SPARK-28155 do not leak SaveMode to file source v2 >> SPARK-28103 Cannot infer filters from union table with empty local >> relation table properly >> SPARK-28024 Incorrect numeric values when out of range >> SPARK-27936 Support local dependency uploading from --py-files >> SPARK-27884 Deprecate Python 2 support in Spark 3.0 >> SPARK-27763 Port test cases from PostgreSQL to Spark SQL >> SPARK-27780 Shuffle server & client should be versioned to enable >> smoother upgrade >> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # >> of joined tables > 12 >> SPARK-27471 Reorganize public v2 catalog API >> SPARK-27520 Introduce a global config system to replace >> hadoopConfiguration >> SPARK-24625 put all the backward compatible behavior change configs >> under spark.sql.legacy.* >> SPARK-24640 size(null) returns null >> SPARK-24702 Unable to cast to calendar interval in spark sql. >> SPARK-24838 Support uncorrelated IN/EXISTS subqueries for more operators >> SPARK-24941 Add RDDBarrier.coalesce() function >> SPARK-25017 Add test suite for ContextBarrierState >> SPARK-25083 remove the type erasure hack in data source scan >> SPARK-25383 Image data source supports sample pushdown >> SPARK-27272 Enable blacklisting of node/executor on fetch failures by >> default >> SPARK-27296 User Defined Aggregating Functions (UDAFs) have a major >> efficiency problem >> SPARK-25128 multiple simultaneous job submissions against k8s backend >> cause driver pods to hang >> SPARK-26731 remove EOLed spark jobs from jenkins >> SPARK-26664 Make DecimalType's minimum adjusted scale configurable >> SPARK-21559 Remove Mesos fine-grained mode >> SPARK-24942 Improve cluster resource management with jobs containing >> barrier stage >> SPARK-25914 Separate projection from grouping and aggregate in logical >> Aggregate >> SPARK-26022 PySpark Comparison with Pandas >> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard >> SPARK-26221 Improve Spark SQL instrumentation and metrics >> SPARK-26425 Add more constraint checks in file streaming source to >> avoid checkpoint corruption >> SPARK-25843 Redesign rangeBetween API >> SPARK-25841 Redesign window function rangeBetween API >> SPARK-25752 Add trait to easily whitelist logical operators that >> produce named output from CleanupAliases >> SPARK-23210 Introduce the concept of default value to schema >> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window >> aggregate >> SPARK-25531 new write APIs for data source v2 >> SPARK-25547 Pluggable jdbc connection factory >> SPARK-20845 Support specification of column names in INSERT INTO >> SPARK-24417 Build and Run Spark on JDK11 >> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes >> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos >> SPARK-25074 Implement maxNumConcurrentTasks() in >> MesosFineGrainedSchedulerBackend >> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 >> SPARK-25186 Stabilize Data Source V2 API >> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier >> execution mode >> SPARK-25390 data source V2 API refactoring >> SPARK-7768 Make user-defined type (UDT) API public >> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition >> Spec >> SPARK-15691 Refactor and improve Hive support >> SPARK-15694 Implement ScriptTransformation in sql/core >> SPARK-16217 Support SELECT INTO statement >> SPARK-16452 basic INFORMATION_SCHEMA support >> SPARK-18134 SQL: MapType in Group BY and Joins not working >> SPARK-18245 Improving support for bucketed table >> SPARK-19842 Informational Referential Integrity Constraints Support in >> Spark >> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested >> list of structures >> SPARK-22632 Fix the behavior of timestamp values for R's DataFrame to >> respect session timezone >> SPARK-22386 Data Source V2 improvements >> SPARK-24723 Discuss necessary info and access in barrier mode + YARN >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> <dev-unsubscr...@spark.apache.org> >> >> >>