Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-10-24 Thread Ashutosh
Hi, We are ready with the initial code. Where can I submit it for review ? I want to get it reviewed before testing it at scale. Also, I see that most of the algorithms take data as RDD[LabeledPoint] . How should we take input for this since there are no labels. Can any body help me out with thes

Re: scalastyle annoys me a little bit

2014-10-24 Thread Koert Kuipers
thanks ted. apologies for complaining about maven here again, but this is the first time i seriously use it for development, and i am completely unfamiliar with it. a few more issues: "mvn clean package -DskipTests" takes about 30 mins for me. thats painful since its needed for the tests. does a

Re: scalastyle annoys me a little bit

2014-10-24 Thread Koert Kuipers
oh i found some stuff about tests and how to continue them, gonna try that now (-fae switch). should have googled before asking... On Fri, Oct 24, 2014 at 3:59 PM, Koert Kuipers wrote: > thanks ted. > > apologies for complaining about maven here again, but this is the first > time i seriously us

Re: scalastyle annoys me a little bit

2014-10-24 Thread Marcelo Vanzin
On Fri, Oct 24, 2014 at 12:59 PM, Koert Kuipers wrote: > "mvn clean package -DskipTests" takes about 30 mins for me. thats painful > since its needed for the tests. does anyone know any tricks to speed it up? > (besides getting a better laptop). does zinc help? I noticed this too, and I also noti

Re: scalastyle annoys me a little bit

2014-10-24 Thread Sean Owen
On Fri, Oct 24, 2014 at 8:59 PM, Koert Kuipers wrote: > "mvn clean package -DskipTests" takes about 30 mins for me. thats painful > since its needed for the tests. does anyone know any tricks to speed it up? > (besides getting a better laptop). does zinc help? Zinc helps by about 50-100%. Worthwh

Re: PR for Hierarchical Clustering Needs Review

2014-10-24 Thread RJ Nowling
Thanks, Xiangrui! Might be worth waiting until after the feature freeze to review since it's a large patch. On Thu, Oct 23, 2014 at 3:26 PM, Xiangrui Meng wrote: > Hi RJ, > > We are close to the v1.2 feature freeze deadline, so I'm busy with the > pipeline feature and couple bugs. I will ask ot

Re: scalastyle annoys me a little bit

2014-10-24 Thread Stephen Boesch
Sean Owen beat me to (strongly) recommending running zinc server. Using the -pl option is great too - but be careful to only use it when your work is restricted to the modules in the (comma separated) list you provide to -pl. Also before using -pl you should do a mvn compile package install on

your weekly git timeout update! TL;DR: i'm now almost certain we're not hitting rate limits.

2014-10-24 Thread shane knapp
so, things look like they've stabilized significantly over the past 10 days, and without any changes on our end: $ /root/tools/get_timeouts.sh 10 timeouts by date: 2014-10-14 -- 2 2014-10-16 -- 1 2014-10-19 -- 1 2014-10-20 -- 2 2014-10-23 -- 5 timeouts by project: 5 NewSparkPullRequestBuild

Moving PR Builder to mvn

2014-10-24 Thread Hari Shreedharan
Over the last few months, it seems like we have selected Maven to be the “official” build system for Spark.  I realize that removing the sbt build may not be easy, but it might be a good idea to start looking into that. We had issues over the past few days where mvn builds were fine, while sbt

Re: Moving PR Builder to mvn

2014-10-24 Thread Patrick Wendell
Overall I think this would be a good idea. The main blocker is just that I think the Maven build is much slower right now than the SBT build. However, if we were able to e.g. parallelize the test build on Jenkins that might make up for it. I'd actually like to have a trigger where we could tests p

Re: scalastyle annoys me a little bit

2014-10-24 Thread Koert Kuipers
thanks everyone, very helpful On Fri, Oct 24, 2014 at 4:22 PM, Stephen Boesch wrote: > Sean Owen beat me to (strongly) recommending running zinc server. Using > the -pl option is great too - but be careful to only use it when your work > is restricted to the modules in the (comma separated) lis

Re: your weekly git timeout update! TL;DR: i'm now almost certain we're not hitting rate limits.

2014-10-24 Thread Patrick Wendell
Thanks for the update Shane. As a point of process, for things like this where we re debugging specific issues - can we use JIRA instead of notifying everyone on the spark-dev list? I'd prefer if ops/infra announcements on the dev list are restricted to things that are widely applicable to develo

Re: Moving PR Builder to mvn

2014-10-24 Thread Hari Shreedharan
I have zinc server running on my mac, and I see maven compilation to be much better than before I had it running. Is the sbt build still faster (sorry, long time since I did a build with sbt). Thanks, Hari On Fri, Oct 24, 2014 at 1:46 PM, Patrick Wendell wrote: > Overall I think this would b

Re: Moving PR Builder to mvn

2014-10-24 Thread Patrick Wendell
Does Zinc still help if you are just running a single totally fresh build? For the pull request builder we purge all state from previous builds. - Patrick On Fri, Oct 24, 2014 at 1:55 PM, Hari Shreedharan wrote: > I have zinc server running on my mac, and I see maven compilation to be much > bet

Re: Moving PR Builder to mvn

2014-10-24 Thread Hari Shreedharan
+1. From what I can see, it definitely does - though I must say I rarely do full end to end builds though. Maybe worth running as an experiment? Thanks, Hari On Fri, Oct 24, 2014 at 2:34 PM, Stephen Boesch wrote: > Zinc absolutely helps - feels like makes builds more than twice as fast - > b

Re: Moving PR Builder to mvn

2014-10-24 Thread Stephen Boesch
Zinc absolutely helps - feels like makes builds more than twice as fast - both on Mac and Linux. It helps both on fresh and existing builds. 2014-10-24 14:06 GMT-07:00 Patrick Wendell : > Does Zinc still help if you are just running a single totally fresh > build? For the pull request builder

Re: Parquet schema migrations

2014-10-24 Thread Gary Malouf
Hi Michael, Does this affect people who use Hive for their metadata store as well? I'm wondering if the issue is as bad as I think it is - namely that if you build up a year's worth of data, adding a field forces you to have to migrate that entire year's data. Gary On Wed, Oct 8, 2014 at 5:08 P

Re: Moving PR Builder to mvn

2014-10-24 Thread Sean Owen
Here's a crude benchmark on a Linux box (GCE n1-standard-4). zinc gets the assembly build in range of SBT's time. mvn -DskipTests clean package 15:27 (start zinc) 8:18 (rebuild) 7:08 ./sbt/sbt -DskipTests clean assembly 5:10 (start zinc) 5:11 (rebuild) 5:06 The dependencies were already download

Re: Moving PR Builder to mvn

2014-10-24 Thread Mark Hamstra
Your's are in the same ballpark with mine, where maven builds with zinc take about 1.4x the time to build with SBT. On Fri, Oct 24, 2014 at 4:24 PM, Sean Owen wrote: > Here's a crude benchmark on a Linux box (GCE n1-standard-4). zinc gets > the assembly build in range of SBT's time. > > mvn -Dsk

serialVersionUID incompatible error in class BlockManagerId

2014-10-24 Thread Qiuzhuang Lian
Hi, I update git today and when connecting to spark cluster, I got the serialVersionUID incompatible error in class BlockManagerId. Here is the log, Shouldn't we better give BlockManagerId a constant serialVersionUID avoid this? Thanks, Qiuzhuang scala> val rdd = sc.parparallelize(1 to 10001

Re: serialVersionUID incompatible error in class BlockManagerId

2014-10-24 Thread Josh Rosen
Are all processes (Master, Worker, Executors, Driver) running the same Spark build?  This error implies that you’re seeing protocol / binary incompatibilities between your Spark driver and cluster. Spark is API-compatibile across the 1.x series, but we don’t make binary link-level compatibility

Re: serialVersionUID incompatible error in class BlockManagerId

2014-10-24 Thread Qiuzhuang Lian
I update git trunk and build in the two linux machines. I think they should have the same version. I am going to do a force clean build and then retry. Thanks. On Sat, Oct 25, 2014 at 9:23 AM, Josh Rosen wrote: > Are all processes (Master, Worker, Executors, Driver) running the same > Spark bu

Re: serialVersionUID incompatible error in class BlockManagerId

2014-10-24 Thread Nan Zhu
According to my experience, there are more issues rather than BlockManager when you try to run spark application whose build version is different with your cluster…. I once tried to make jdbc server build with branch-jdbc-1.0 run with a branch-1.0 cluster…no workaround exits…just had to repla

Re: serialVersionUID incompatible error in class BlockManagerId

2014-10-24 Thread Qiuzhuang Lian
After I do a clean rebuild. It works now. Thanks, Qiuzhuang On Sat, Oct 25, 2014 at 9:42 AM, Nan Zhu wrote: > According to my experience, there are more issues rather than > BlockManager when you try to run spark application whose build version is > different with your cluster…. > > I once tri