Re: Error when testing with large sparse svm

2014-07-16 Thread crater
I don't really know how to create JIRA :( Specifically, the code I commented out are: //val prediction = model.predict(test.map(_.features)) //val predictionAndLabel = prediction.zip(test.map(_.label)) //val prediction = model.predict(training.map(_.features)) //val predictionAndL

Re: Error when testing with large sparse svm

2014-07-15 Thread crater
I don't really have "my code", I was just running example program in : examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala What I did was simple try this example on a 13M sparse data, and I got the error I posted. Today I managed to ran it after I commented out th

Re: Error when testing with large sparse svm

2014-07-15 Thread crater
I got a bit progress. I think the problem is with the "BinaryClassificationMetrics", as long as I comment out all the prediction related metrics, I can run the svm example with my data. So the problem should be there I guess. -- View this message in context: http://apache-spark-user-list.1001

Re: Error when testing with large sparse svm

2014-07-14 Thread crater
(1) What is "number of partitions"? Is it number of workers per node? (2) I already set the driver memory pretty big, which is 25g. (3) I am running Spark 1.0.1 in standalone cluster with 9 nodes, 1 one them works as master, others are workers. -- View this message in context: http://apache-s

Re: Error when testing with large sparse svm

2014-07-14 Thread crater
Hi Krishna, Thanks for your help. Are you able to get your 29M data running yet? I fix the previous problem by setting larger spark.akka.frameSize, but now I get some other errors below. Did you get these errors before? 14/07/14 11:32:20 ERROR TaskSchedulerImpl: Lost executor 1 on node7: remote

Re: Error when testing with large sparse svm

2014-07-14 Thread crater
Hi xiangrui, Where can I set the "spark.akka.frameSize" ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592p9616.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Error when testing with large sparse svm

2014-07-14 Thread crater
Hi, I encounter an error when testing svm (example one) on very large sparse data. The dataset I ran on was a toy dataset with only ten examples but 13 million sparse vector with a few thousands non-zero entries. The errors is showing below. I am wondering is this a bug or I am missing something?

Re: Putting block rdd failed when running example svm on large data

2014-07-12 Thread crater
Hi Xiangrui, Thanks for the information. Also, it is possible to figure out the execution time per iteration for SVM? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html Sent from

Putting block rdd failed when running example svm on large data

2014-07-12 Thread crater
Hi, I am trying to run the example BinaryClassification (org.apache.spark.examples.mllib.BinaryClassification) on a 202G file. I am constantly getting the messages looks like below, it is normal or I am missing something. 14/07/12 09:49:04 WARN BlockManager: Block rdd_4_196 could not be dropped f