Re: Logistic Regression MLLib Slow

2014-06-04 Thread DB Tsai
Hi Krishna, It should work, and we use it in production with great success. However, the constructor of LogisticRegressionModel is private[mllib], so you have to write your code, and have the package name under org.apache.spark.mllib instead of using scala console. Sincerely, DB Tsai ---

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
Does L-BFSG work with spark 1.0? (see code sample below). Eventually, I would like to have L-BFGS working but I was facing an issue where 10 passes over the data was taking forever. I ran spark in standalone mode and the performance is much better! Regards, Krishna --

Re: Logistic Regression MLLib Slow

2014-06-04 Thread DB Tsai
Hi Krishna, Also, the default optimizer with SGD converges really slow. If you are willing to write scala code, there is a full working example for training Logistic Regression with L-BFGS (a quasi-Newton method) in scala. It converges a way faster than SGD. See http://spark.apache.org/docs/lates

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
I will try both and get back to you soon! Thanks for all your help! Regards, Krishna On Wed, Jun 4, 2014 at 7:56 PM, Xiangrui Meng wrote: > Hi Krishna, > > Specifying executor memory in local mode has no effect, because all of > the threads run inside the same JVM. You can either try > --driv

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Xiangrui Meng
Hi Krishna, Specifying executor memory in local mode has no effect, because all of the threads run inside the same JVM. You can either try --driver-memory 60g or start a standalone server. Best, Xiangrui On Wed, Jun 4, 2014 at 7:28 PM, Xiangrui Meng wrote: > 80M by 4 should be about 2.5GB uncom

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Xiangrui Meng
80M by 4 should be about 2.5GB uncompressed. 10 iterations shouldn't take that long, even on a single executor. Besides what Matei suggested, could you also verify the executor memory in http://localhost:4040 in the Executors tab. It is very likely the executors do not have enough memory. In that c

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Matei Zaharia
Ah, is the file gzipped by any chance? We can’t decompress gzipped files in parallel so they get processed by a single task. It may also be worth looking at the application UI (http://localhost:4040) to see 1) whether all the data fits in memory in the Storage tab (maybe it somehow becomes larg

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
I am using the MLLib one (LogisticRegressionWithSGD) with PySpark. I am running to only 10 iterations. The MLLib version of logistic regression doesn't seem to use all the cores on my machine. Regards, Krishna On Wed, Jun 4, 2014 at 6:47 PM, Matei Zaharia wrote: > Are you using the logistic

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Matei Zaharia
Are you using the logistic_regression.py in examples/src/main/python or examples/src/main/python/mllib? The first one is an example of writing logistic regression by hand and won’t be as efficient as the MLlib one. I suggest trying the MLlib one. You may also want to check how many iterations i

Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
Hi All., I am new to Spark and I am trying to run LogisticRegression (with SGD) using MLLib on a beefy single machine with about 128GB RAM. The dataset has about 80M rows with only 4 features so it barely occupies 2Gb on disk. I am running the code using all 8 cores with 20G memory using spark-su