Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-19 Thread Matei Zaharia
+1 Tested on Mac OS X, checked that bugs with too many small files being spilled are fixed. Matei > On Nov 19, 2014, at 7:44 PM, Krishna Sankar wrote: > > +1 > 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 > -Dhadoop.version=2.4.0 -DskipTests clean package 10:49 min > 2. Tested pys

Re: Too many open files error

2014-11-19 Thread Sandy Ryza
Quizhang, This is a known issue that ExternalAppendOnlyMap can do tons of tiny spills in certain situations. SPARK-4452 aims to deal with this issue, but we haven't finalized a solution yet. Dinesh's solution should help as a workaround, but you'll likely experience suboptimal performance when tr

Re: Too many open files error

2014-11-19 Thread Dinesh J. Weerakkody
Hi Qiuzhuang, This is a linux related issue. Please go through this [1] and change the limits. hope this will solve your problem. [1] https://rtcamp.com/tutorials/linux/increase-open-files-limit/ On Thu, Nov 20, 2014 at 9:45 AM, Qiuzhuang Lian wrote: > Hi All, > > While doing some ETL, I run

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-19 Thread Ashutosh
Done. Thanks. Added you as a collaborator. So that you can add code in it. Thanks, Ashutosh From: slcclimber [via Apache Spark Developers List] Sent: Thursday, November 20, 2014 7:49 AM To: Ashutosh Trivedi (MT2013030) Subject: Re: [MLlib] Contributing Algorit

Too many open files error

2014-11-19 Thread Qiuzhuang Lian
Hi All, While doing some ETL, I run into error of 'Too many open files' as following logs, Thanks, Qiuzhuang 4/11/20 20:12:02 INFO collection.ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 100.8 KB to disk (953 times so far) 14/11/20 20:12:02 ERROR storage.DiskBlockObjectWriter: Unc

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-19 Thread Krishna Sankar
+1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 10:49 min 2. Tested pyspark, mlib 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK 2.5. rdd operations OK 2.6. recommendation OK 2.7.

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-19 Thread slcclimber
You could also use rdd.zipWithIndex() to create indexes. Anant -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9441.html Sent from the Apache Spark Developers List mailing list archive at Nabble

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-19 Thread Xiangrui Meng
+1. Checked version numbers and doc. Tested a few ML examples with Java 6 and verified some recently merged bug fixes. -Xiangrui On Wed, Nov 19, 2014 at 2:51 PM, Andrew Or wrote: > I will start with a +1 > > 2014-11-19 14:51 GMT-08:00 Andrew Or : > >> Please vote on releasing the following candid

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-19 Thread Andrew Or
I will start with a +1 2014-11-19 14:51 GMT-08:00 Andrew Or : > Please vote on releasing the following candidate as Apache Spark version 1 > .1.1. > > This release fixes a number of bugs in Spark 1.1.0. Some of the notable > ones are > - [SPARK-3426] Sort-based shuffle compression settings are in

Build break

2014-11-19 Thread Patrick Wendell
Hey All, Just a heads up. I merged this patch last night which caused the Spark build to break: https://github.com/apache/spark/commit/397d3aae5bde96b01b4968dde048b6898bb6c914 The patch itself was fine and previously had passed on Jenkins. The issue was that other intermediate changes merged sin

Re: Intro to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt (for beginners)

2014-11-19 Thread Chester At Work
gen-idea should work. I use it all the time. But use the approach that works for you Sent from my iPad On Nov 18, 2014, at 11:12 PM, "Yiming \(John\) Zhang" wrote: > Hi Chester, thank you for your reply. But I tried this approach and it > failed. It seems that there are more difficulty usin

Help needed to publish SizeEstimator as separate library

2014-11-19 Thread madhu phatak
Hi, As I was going through spark source code, SizeEstimator caught my eye. It's a very useful tool to do the size estimations on JVM which helps in use cases like memory bounded cache. It w