Cogroup Pandas UDF missing: SPARK-27463 <https://issues.apache.org/jira/browse/SPARK-27463> Support Dataframe Cogroup via Pandas UDFs Vectorized R execution:
SPARK-26759 <https://issues.apache.org/jira/browse/SPARK-26759> Arrow optimization in SparkR's interoperability 2019년 10월 8일 (화) 오전 7:50, Jungtaek Lim <kabhwan.opensou...@gmail.com>님이 작성: > Thanks for bringing the nice summary of Spark 3.0 improvements! > > I'd like to add some items from structured streaming side, > > SPARK-28199 <https://issues.apache.org/jira/browse/SPARK-28199> Move > Trigger implementations to Triggers.scala and avoid exposing these to the > end users (removal of deprecated) > SPARK-23539 <https://issues.apache.org/jira/browse/SPARK-23539> Add > support for Kafka headers in Structured Streaming > SPARK-25501 <https://issues.apache.org/jira/browse/SPARK-25501> Add kafka > delegation token support (there were follow-up issues to add > functionalities like support multi clusters, etc.) > SPARK-26848 <https://issues.apache.org/jira/browse/SPARK-26848> Introduce > new option to Kafka source: offset by timestamp (starting/ending) > SPARK-28074 <https://issues.apache.org/jira/browse/SPARK-28074> Log warn > message on possible correctness issue for multiple stateful operations in > single query > > and core side, > > SPARK-23155 <https://issues.apache.org/jira/browse/SPARK-23155> New > feature: apply custom log URL pattern for executor log URLs in SHS > (follow-up issue expanded the functionality to Spark UI as well) > > FYI if we count on current work in progress, there's ongoing umbrella > issue regarding rolling event log & snapshot (SPARK-28594 > <https://issues.apache.org/jira/browse/SPARK-28594>) which we struggle to > get things done in Spark 3.0. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Tue, Oct 8, 2019 at 7:02 AM Xingbo Jiang <jiangxb1...@gmail.com> wrote: > >> Hi all, >> >> I went over all the finished JIRA tickets targeted to Spark 3.0.0, here >> I'm listing all the notable features and major changes that are ready to >> test/deliver, please don't hesitate to add more to the list: >> >> SPARK-11215 <https://issues.apache.org/jira/browse/SPARK-11215> Multiple >> columns support added to various Transformers: StringIndexer >> >> SPARK-11150 <https://issues.apache.org/jira/browse/SPARK-11150> >> Implement Dynamic Partition Pruning >> >> SPARK-13677 <https://issues.apache.org/jira/browse/SPARK-13677> Support >> Tree-Based Feature Transformation >> >> SPARK-16692 <https://issues.apache.org/jira/browse/SPARK-16692> Add >> MultilabelClassificationEvaluator >> >> SPARK-19591 <https://issues.apache.org/jira/browse/SPARK-19591> Add >> sample weights to decision trees >> >> SPARK-19712 <https://issues.apache.org/jira/browse/SPARK-19712> Pushing >> Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc. >> >> SPARK-19827 <https://issues.apache.org/jira/browse/SPARK-19827> R API >> for Power Iteration Clustering >> >> SPARK-20286 <https://issues.apache.org/jira/browse/SPARK-20286> Improve >> logic for timing out executors in dynamic allocation >> >> SPARK-20636 <https://issues.apache.org/jira/browse/SPARK-20636> >> Eliminate unnecessary shuffle with adjacent Window expressions >> >> SPARK-22148 <https://issues.apache.org/jira/browse/SPARK-22148> Acquire >> new executors to avoid hang because of blacklisting >> >> SPARK-22796 <https://issues.apache.org/jira/browse/SPARK-22796> Multiple >> columns support added to various Transformers: PySpark QuantileDiscretizer >> >> SPARK-23128 <https://issues.apache.org/jira/browse/SPARK-23128> A new >> approach to do adaptive execution in Spark SQL >> >> SPARK-23674 <https://issues.apache.org/jira/browse/SPARK-23674> Add >> Spark ML Listener for Tracking ML Pipeline Status >> >> SPARK-23710 <https://issues.apache.org/jira/browse/SPARK-23710> Upgrade >> the built-in Hive to 2.3.5 for hadoop-3.2 >> >> SPARK-24333 <https://issues.apache.org/jira/browse/SPARK-24333> Add fit >> with validation set to Gradient Boosted Trees: Python API >> >> SPARK-24417 <https://issues.apache.org/jira/browse/SPARK-24417> Build >> and Run Spark on JDK11 >> >> SPARK-24615 <https://issues.apache.org/jira/browse/SPARK-24615> >> Accelerator-aware task scheduling for Spark >> >> SPARK-24920 <https://issues.apache.org/jira/browse/SPARK-24920> Allow >> sharing Netty's memory pool allocators >> >> SPARK-25250 <https://issues.apache.org/jira/browse/SPARK-25250> Fix race >> condition with tasks running when new attempt for same stage is created >> leads to other task in the next attempt running on the same partition id >> retry multiple times >> >> SPARK-25341 <https://issues.apache.org/jira/browse/SPARK-25341> Support >> rolling back a shuffle map stage and re-generate the shuffle files >> >> SPARK-25348 <https://issues.apache.org/jira/browse/SPARK-25348> Data >> source for binary files >> >> SPARK-25603 <https://issues.apache.org/jira/browse/SPARK-25603> >> Generalize Nested Column Pruning >> >> SPARK-26132 <https://issues.apache.org/jira/browse/SPARK-26132> Remove >> support for Scala 2.11 in Spark 3.0.0 >> >> SPARK-26215 <https://issues.apache.org/jira/browse/SPARK-26215> define >> reserved keywords after SQL standard >> >> SPARK-26412 <https://issues.apache.org/jira/browse/SPARK-26412> Allow >> Pandas UDF to take an iterator of pd.DataFrames >> >> SPARK-26785 <https://issues.apache.org/jira/browse/SPARK-26785> data >> source v2 API refactor: streaming write >> >> SPARK-26956 <https://issues.apache.org/jira/browse/SPARK-26956> remove >> streaming output mode from data source v2 APIs >> >> SPARK-27064 <https://issues.apache.org/jira/browse/SPARK-27064> create >> StreamingWrite at the beginning of streaming execution >> >> SPARK-27119 <https://issues.apache.org/jira/browse/SPARK-27119> Do not >> infer schema when reading Hive serde table with native data source >> >> SPARK-27225 <https://issues.apache.org/jira/browse/SPARK-27225> >> Implement join strategy hints >> >> SPARK-27240 <https://issues.apache.org/jira/browse/SPARK-27240> Use >> pandas DataFrame for struct type argument in Scalar Pandas UDF >> >> SPARK-27338 <https://issues.apache.org/jira/browse/SPARK-27338> Fix >> deadlock between TaskMemoryManager and >> UnsafeExternalSorter$SpillableIterator >> >> SPARK-27396 <https://issues.apache.org/jira/browse/SPARK-27396> Public >> APIs for extended Columnar Processing Support >> >> SPARK-27589 <https://issues.apache.org/jira/browse/SPARK-27589> >> Re-implement file sources with data source V2 API >> >> SPARK-27677 <https://issues.apache.org/jira/browse/SPARK-27677> >> Disk-persisted RDD blocks served by shuffle service, and ignored for >> Dynamic Allocation >> >> SPARK-27699 <https://issues.apache.org/jira/browse/SPARK-27699> >> Partially push down disjunctive predicated in Parquet/ORC >> >> SPARK-27763 <https://issues.apache.org/jira/browse/SPARK-27763> Port >> test cases from PostgreSQL to Spark SQL (ongoing) >> >> SPARK-27884 <https://issues.apache.org/jira/browse/SPARK-27884> >> Deprecate Python 2 support >> >> SPARK-27921 <https://issues.apache.org/jira/browse/SPARK-27921> Convert >> applicable *.sql tests into UDF integrated test base >> >> SPARK-27963 <https://issues.apache.org/jira/browse/SPARK-27963> Allow >> dynamic allocation without an external shuffle service >> >> SPARK-28177 <https://issues.apache.org/jira/browse/SPARK-28177> Adjust >> post shuffle partition number in adaptive execution >> >> SPARK-28372 <https://issues.apache.org/jira/browse/SPARK-28372> Document >> Spark WEB UI >> >> SPARK-28399 <https://issues.apache.org/jira/browse/SPARK-28399> >> RobustScaler feature transformer >> >> SPARK-28426 <https://issues.apache.org/jira/browse/SPARK-28426> Metadata >> Handling in Thrift Server >> >> SPARK-28588 <https://issues.apache.org/jira/browse/SPARK-28588> Build a >> SQL reference doc (ongoing) >> >> SPARK-28608 <https://issues.apache.org/jira/browse/SPARK-28608> Improve >> test coverage of ThriftServer >> >> SPARK-28753 <https://issues.apache.org/jira/browse/SPARK-28753> >> Dynamically reuse subqueries in AQE >> >> SPARK-28855 <https://issues.apache.org/jira/browse/SPARK-28855> Remove >> outdated Experimental, Evolving annotations >> SPARK-25908 <https://issues.apache.org/jira/browse/SPARK-25908> >> SPARK-28980 <https://issues.apache.org/jira/browse/SPARK-28980> Remove >> deprecated items since <= 2.2.0 >> >> Cheers, >> >> Xingbo >> >