Spark SQL API changes and stabilization

2015-01-14 Thread Reynold Xin
Hi Spark devs, Given the growing number of developers that are building on Spark SQL, we would like to stabilize the API in 1.3 so users and developers can be confident to build on it. This also gives us a chance to improve the API. In particular, we are proposing the following major changes. Thi

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread RJ Nowling
Congratulations, Chris! I created a JIRA for "dimensional" RDDs that might be relevant: https://issues.apache.org/jira/browse/SPARK-4727 Jeremy Freeman pointed me to his lab's work on for neuroscience that have some related functionality : http://thefreemanlab.com/thunder/ On Wed, Jan 14, 2015 a

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Aniket
Hi Chris This is super cool. I was wondering if this would be an open source project so that people can contribute or reuse? Thanks, Aniket On Thu Jan 15 2015 at 07:39:29 Mattmann, Chris A (3980) [via Apache Spark Developers List] wrote: > Hi Spark Devs, > > Just wanted to FYI that I was funde

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Matei Zaharia
Yeah, very cool! You may also want to check out https://issues.apache.org/jira/browse/SPARK-5097 as something to build upon for these operations. Matei > On Jan 14, 2015, at 6:18 PM, Reynold Xin wrote: > > Chris, > > This is really cool. Congratulations and thanks for sharing the news. > >

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Reynold Xin
Chris, This is really cool. Congratulations and thanks for sharing the news. On Wed, Jan 14, 2015 at 6:08 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Spark Devs, > > Just wanted to FYI that I was funded on a 2 year NASA proposal > to build out the concept of a scie

SciSpark: NASA AIST14 proposal

2015-01-14 Thread Mattmann, Chris A (3980)
Hi Spark Devs, Just wanted to FYI that I was funded on a 2 year NASA proposal to build out the concept of a scientific RDD (create by space/time, and other operations) for use in some neat climate related NASA use cases. http://esto.nasa.gov/files/solicitations/AIST_14/ROSES2014_AIST_A41_awards.

Re: DBSCAN for MLlib

2015-01-14 Thread Xiangrui Meng
Please find my comments on the JRIA page. -Xiangrui On Tue, Jan 13, 2015 at 1:49 PM, Muhammad Ali A'råby wrote: > I have to say, I have created a Jira task for it: > [SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRA > > | | > | | | | | | > | [SPARK-5226] Add DBSCAN Clus

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Marcelo Vanzin
On Wed, Jan 14, 2015 at 1:40 PM, RJ Nowling wrote: > What is the difference between pom and jar packaging? If you do an install on a "pom" packaging module, it will only install the module's pom file in the target repository. -- Marcelo -

Re: Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Thanks, Marcelo! I'll look into "install" vs "install-file". What is the difference between pom and jar packaging? One of the challenges is that I have to satisfy Fedora / Red Hat packaging guidelines, which makes life a little more interesting. :) (e.g., RPMs should resolve against other RPMs

Re: K-Means And Class Tags

2015-01-14 Thread Joseph Bradley
(After asking around,) retag() is private[spark] in Scala, but Java ignores the "private[X]," making retag (unintentionally) public in Java. Currently, your solution of retagging from Java is the best hack I can think of. It may take a bit of engineering to create a proper fix for the long-term.

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Marcelo Vanzin
Hi RJ, I think I remember noticing in the past that some Guava metadata ends up overwriting maven-generated metadata in the assembly's manifest. That's probably something we should fix if that still affects the build. That being said, this is probably happening because you're using "install-file"

Re: Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Hi Sean, I confirmed that if I take the Spark 1.2.0 release (a428c446), undo the guava PR [1], and use -Dmaven.install.skip=false with the workflow above, the problem is fixed. RJ [1] https://github.com/apache/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2#diff-6382f8428b13fa6082fa688178

Re: Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Thanks, Sean. Yes, Spark is incorrectly copying the spark assembly jar to com/google/guava in the maven repository. This is for the 1.2.0 release, just to clarify. I reverted the patches that shade Guava and removed the parts disabling the install plugin and it seemed to fix the issue. It seems

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Sean Owen
Guava is shaded, although one class is left in its original package. This shouldn't have anything to do with Spark's package or namespace though. What are you saying is in com/google/guava? You can un-skip the install plugin with -Dmaven.install.skip=false On Wed, Jan 14, 2015 at 7:26 PM, RJ Nowl

Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Hi all, I'm trying to upgrade some Spark RPMs from 1.1.0 to 1.2.0. As part of the RPM process, we build Spark with Maven. With Spark 1.2.0, though, the artifacts are placed in com/google/guava and there is no org/apache/spark. I saw that the pom.xml files had been modified to prevent the instal

SparkSpark-perf terasort WIP branch

2015-01-14 Thread Ewan Higgs
Hi all, I'm trying to build the Spark-perf WIP code but there are some errors to do with Hadoop APIs. I presume this is because there is some Hadoop version set and it's referring to that. But I can't seem to find it. The errors are as follows: [info] Compiling 15 Scala sources and 2 Java sou