Re: [VOTE] Spark 2.2.2 (RC2)

2018-07-01 Thread Holden Karau
Leaving documents aside (I think we should maybe have a thread on how we want to handle doc changes to existing releases on dev@) I'm +1 PySpark venv checks out. On Sun, Jul 1, 2018 at 9:40 PM, Hyukjin Kwon wrote: > Let me leave a note about https://issues.apache. > org/jira/browse/SPARK-24530.

Re: Time for 2.3.2?

2018-07-01 Thread Saisai Shao
I will start preparing the release. Thanks John Zhuge 于2018年6月30日周六 上午10:31写道: > +1 Looking forward to the critical fixes in 2.3.2. > > On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue > wrote: > >> +1 >> >> On Thu, Jun 28, 2018 at 9:34 AM Xiao Li wrote: >> >>> +1. Thanks, Saisai! >>> >>> The impac

Re: [VOTE] Spark 2.2.2 (RC2)

2018-07-01 Thread Hyukjin Kwon
Let me leave a note about https://issues.apache.org/jira/browse/SPARK-24530. The Python documentation should be built against Python 3's Sphinx for now as a workaround. There was an issue found, SPARK-24530 and I am now trying to update the documentation, release process, and probably the Makefile

t

2018-07-01 Thread 450533090

Re: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-01 Thread Sean Owen
This could be a good optimization. But can it be done without changing any APIs or slowing anything else down? if so this could be worth a pull request. On Sun, Jul 1, 2018 at 9:21 PM Vincent Wang wrote: > > Hi there, > > I'm using GBTClassifier do some classification jobs and find the performance

Fwd: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-01 Thread Vincent Wang
Hi there, I'm using *GBTClassifier* do some classification jobs and find the performance of scoring stage is not quite satisfying. The trained model has about 160 trees and the input feature vector is sparse and its size is about 20+. After some digging, I found the model will repeatedly and rand

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread kant kodali
I am not affiliated with Flink or Spark but I do think some of the thoughts here makes sense On Sun, Jul 1, 2018 at 4:12 PM, Sean Owen wrote: > It's true, that is

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Sean Owen
It's true, that is one of the issues to be solved by the 2.12-compatible build, because it otherwise introduces an overload ambiguity for Java 8 lambdas. But for that reason I think the current transform() method would start working with lambdas. That would only help 2.12 builds; maybe that's an OK

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Reynold Xin
This wouldn’t be a problem with Scala 2.12 right? On Sun, Jul 1, 2018 at 12:23 PM Sean Owen wrote: > I see, transform() doesn't have the same overload that other methods do in > order to support Java 8 lambdas as you'd expect. One option is to introduce > something like MapFunction for transform

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Sean Owen
I see, transform() doesn't have the same overload that other methods do in order to support Java 8 lambdas as you'd expect. One option is to introduce something like MapFunction for transform and introduce an overload. I think transform() isn't used much at all, so maybe why it wasn't Java-fied. B

[ANNOUNCE] Apache Spark 2.1.3

2018-07-01 Thread Holden Karau
We are happy to announce the availability of Spark 2.1.3! Apache Spark 2.1.3 is a maintenance release, based on the branch-2.1 maintenance branch of Spark. We strongly recommend all 2.1.x users to upgrade to this stable release. The release notes are available at http://spark.apache.org/releases/s

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Ismael Carnales
No, because Function1 from Scala is not a functional interface. You can see a simple example of what I'm trying to accomplish In the unit test here: https://github.com/void/spark/blob/java-transform/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java#L73 On Sun, Jul 1, 2018 a

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Sean Owen
Don't Java 8 lambdas let you do this pretty immediately? Can you give an example here of what you want to do and how you are trying to do it? On Sun, Jul 1, 2018, 12:42 PM Ismael Carnales wrote: > Hi, > it would be nice to have an easier way to use the Dataset transform > method from Java than

Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Ismael Carnales
Hi, it would be nice to have an easier way to use the Dataset transform method from Java than implementing a Function1 from Scala. I've made a simple implentation here: https://github.com/void/spark/tree/java-transform Should I open a JIRA? Ismael Carnales

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-07-01 Thread Sean Owen
Somewhere, you have mismatched versions of Spark on your classpath. On Sun, Jul 1, 2018, 9:01 AM Peter Liu wrote: > Hello there, > > I didn't get any response/help from the user list for the following > question and thought people on the dev list might be able to help?: > > I upgraded to spark 2

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-07-01 Thread Peter Liu
Hello there, I didn't get any response/help from the user list for the following question and thought people on the dev list might be able to help?: I upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload and got the error (java.lang.AbstractMethodError) never seen before; see the