ALS on EC2

2014-07-14 Thread Srikrishna S
Using properties file: null Main class: RecommendationALS Arguments: _train.csv _validation.csv _test.csv System properties: SPARK_SUBMIT -> true spark.app.name -> RecommendationALS spark.jars -> file:/root/projects/spark-recommendation-benchmark/benchmark_mf/target/scala-2.10/recommendation-bench

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
number of partitions, which should match the number of cores > 2) driver memory (you can see it from the executor tab of the Spark > WebUI and set it with "--driver-memory 10g" > 3) the version of Spark you were running > > Best, > Xiangrui > > On Mon, Jul 14, 2014 at 1

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
That is exactly the same error that I got. I am still having no success. Regards, Krishna On Mon, Jul 14, 2014 at 11:50 AM, crater wrote: > Hi Krishna, > > Thanks for your help. Are you able to get your 29M data running yet? I fix > the previous problem by setting larger spark.akka.frameSize, bu

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
If you use Scala, you can do: val conf = new SparkConf() .setMaster("yarn-client") .setAppName("Logistic regression SGD fixed") .set("spark.akka.frameSize", "100") .setExecutorEnv("SPARK_JAVA_OPTS", " -Dspark.akka.frameSize=100") var sc = n

Re: Akka Client disconnected

2014-07-12 Thread Srikrishna S
I am using the master that I compiled 2 days ago. Can you point me to the JIRA? On Sat, Jul 12, 2014 at 9:13 AM, DB Tsai wrote: > Are you using 1.0 or current master? A bug related to this is fixed in > master. > > On Jul 12, 2014 8:50 AM, "Srikrishna S" wrote: &

Akka Client disconnected

2014-07-12 Thread Srikrishna S
I am run logistic regression with SGD on a problem with about 19M parameters (the kdda dataset from the libsvm library) I consistently see that the nodes on my computer get disconnected and soon the whole job goes to a grinding halt. 14/07/12 03:05:16 ERROR cluster.YarnClientClusterScheduler: Los

Job getting killed

2014-07-11 Thread Srikrishna S
I am trying to run Logistic Regression on the url dataset (from libsvm) using the exact same code as the example on a 5 node Yarn-Cluster. I get a pretty cryptic error that says "Killed" Nothing more Settings: --master yarn-client" --verbose" --driver-memory 24G --executor-memory 24G

Re: Spark Installation

2014-07-08 Thread Srikrishna S
午11:53,Krishna Sankar 写道: >> >> Couldn't find any reference of CDH in pom.xml - profiles or the >> hadoop.version.Am also wondering how the cdh compatible artifact was >> compiled. >> Cheers >> >> >> >> On Mon, Jul 7, 2014 at 8:07 PM, Srikri

Spark Installation

2014-07-07 Thread Srikrishna S
Hi All, Does anyone know what the command line arguments to mvn are to generate the pre-built binary for spark on Hadoop 2-CHD5. I would like to pull in a recent bug fix in spark-master and rebuild the binaries in the exact same way that was used for that provided on the website. I have tried th

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
oking at the application UI ( > http://localhost:4040) > >>> to see 1) whether all the data fits in memory in the Storage tab > (maybe it > >>> somehow becomes larger, though it seems unlikely that it would exceed > 20 GB) > >>> and 2) how many parallel ta

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
UI ( > http://localhost:4040) > >> to see 1) whether all the data fits in memory in the Storage tab (maybe > it > >> somehow becomes larger, though it seems unlikely that it would exceed > 20 GB) > >> and 2) how many parallel tasks run in each iteration

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
ns it runs — by default I > think it runs 100, which may be more than you need. > > Matei > > On Jun 4, 2014, at 5:47 PM, Srikrishna S wrote: > > > Hi All., > > > > I am new to Spark and I am trying to run LogisticRegression (with SGD) > using MLLib on a beef

Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
Hi All., I am new to Spark and I am trying to run LogisticRegression (with SGD) using MLLib on a beefy single machine with about 128GB RAM. The dataset has about 80M rows with only 4 features so it barely occupies 2Gb on disk. I am running the code using all 8 cores with 20G memory using spark-su