Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Joseph Bradley
There have been some comments about using Pipelines outside of ML, but I have not yet seen a real need for it. If a user does want to use Pipelines for non-ML tasks, they still can use Transformers + PipelineModels. Will that work? On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski wrote: > Hi,

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Koert Kuipers
i asked around a little, and the general trend at our clients seems to be that they plan to upgrade the clusters to java 8 within the year. so with that in mind i wish this was a little later (i would have preferred a java-8-only spark at the end of year). but since a major spark version only come

Re: SPARK-13843 and future of streaming backends

2016-03-25 Thread David Nalley
> As far as group / artifact name compatibility, at least in the case of > Kafka we need different artifact names anyway, and people are going to > have to make changes to their build files for spark 2.0 anyway. As > far as keeping the actual classes in org.apache.spark to not break > code despi

[spark.ml] Why is private class ColumnPruner?

2016-03-25 Thread Jacek Laskowski
Hi, Came across `private class ColumnPruner` with "TODO(ekl) make this a public transformer" in scaladoc, cf. https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala#L317. Why is this private and is there a JIRA for the TODO(ekl)? Pozdrawiam, J

Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Jacek Laskowski
Hi, After few weeks with spark.ml now, I came to conclusion that Transformer concept from Pipeline API (spark.ml/MLlib) should be part of DataFrame (SQL) where they fit better. Are there any plans to migrate Transformer API (ML) to DataFrame (SQL)? Pozdrawiam, Jacek Laskowski https://medium.

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Andrew Ray
+1 on removing Java 7 and Scala 2.10 support. It looks to be entirely possible to support Java 8 containers in a YARN cluster otherwise running Java 7 (example code for alt JAVA_HOME https://issues.apache.org/jira/secure/attachment/12671739/YARN-1964.patch) so really there should be no big problem

Re: Does SparkSql has official jdbc/odbc driver ?

2016-03-25 Thread Daniel Darabos
I haven't tried this, but I thought you can run the Thriftserver in Spark and then connect with the HiveServer2 JDBC driver: http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#running-the-thrift-jdbcodbc-server On Fri, Mar 25, 2016 at 7:57 AM, Reynold Xin wrote: > No - it is too painf