Hi Krishna,
It should work, and we use it in production with great success.
However, the constructor of LogisticRegressionModel is private[mllib],
so you have to write your code, and have the package name under
org.apache.spark.mllib instead of using scala console.
Sincerely,
DB Tsai
---
Does L-BFSG work with spark 1.0? (see code sample below).
Eventually, I would like to have L-BFGS working but I was facing an issue
where 10 passes over the data was taking forever. I ran spark in standalone
mode and the performance is much better!
Regards,
Krishna
--
Hi Krishna,
Also, the default optimizer with SGD converges really slow. If you are
willing to write scala code, there is a full working example for
training Logistic Regression with L-BFGS (a quasi-Newton method) in
scala. It converges a way faster than SGD.
See
http://spark.apache.org/docs/lates
I will try both and get back to you soon!
Thanks for all your help!
Regards,
Krishna
On Wed, Jun 4, 2014 at 7:56 PM, Xiangrui Meng wrote:
> Hi Krishna,
>
> Specifying executor memory in local mode has no effect, because all of
> the threads run inside the same JVM. You can either try
> --driv
Hi Krishna,
Specifying executor memory in local mode has no effect, because all of
the threads run inside the same JVM. You can either try
--driver-memory 60g or start a standalone server.
Best,
Xiangrui
On Wed, Jun 4, 2014 at 7:28 PM, Xiangrui Meng wrote:
> 80M by 4 should be about 2.5GB uncom
80M by 4 should be about 2.5GB uncompressed. 10 iterations shouldn't
take that long, even on a single executor. Besides what Matei
suggested, could you also verify the executor memory in
http://localhost:4040 in the Executors tab. It is very likely the
executors do not have enough memory. In that c
Ah, is the file gzipped by any chance? We can’t decompress gzipped files in
parallel so they get processed by a single task.
It may also be worth looking at the application UI (http://localhost:4040) to
see 1) whether all the data fits in memory in the Storage tab (maybe it somehow
becomes larg
I am using the MLLib one (LogisticRegressionWithSGD) with PySpark. I am
running to only 10 iterations.
The MLLib version of logistic regression doesn't seem to use all the cores
on my machine.
Regards,
Krishna
On Wed, Jun 4, 2014 at 6:47 PM, Matei Zaharia
wrote:
> Are you using the logistic
Are you using the logistic_regression.py in examples/src/main/python or
examples/src/main/python/mllib? The first one is an example of writing logistic
regression by hand and won’t be as efficient as the MLlib one. I suggest trying
the MLlib one.
You may also want to check how many iterations i