Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-25 Thread Tom Vacek
I don't know about Spark's implementation, but with LBFGS, there is a line search step. Since computing the line search takes roughly the same work as one iteration, an efficient implementation will take a full step and simultaneously compute the gradient for the next step and check if the update

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-25 Thread David Hall
LBFGS will not take a step that sends the objective value up. It might try a step that is "too big" and reject it, so if you're just logging everything that gets tried by LBFGS, you could see that. The "iterations" method of the minimizer should never return an increasing objective value. If you're

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-25 Thread DB Tsai
Another interesting benchmark. *News20 dataset - 0.14M row, 1,355,191 features, 0.034% non-zero elements.* LBFGS converges in 70 seconds, while GD seems to be not progressing. Dense feature vector will be too big to fit in the memory, so only conduct the sparse benchmark. I saw the sometimes th

Re: thoughts on spark_ec2.py?

2014-04-25 Thread Andrew Or
Hi Art, First of all thanks a lot for your PRs. We are currently in the middle of all the Spark 1.0 release so most of us are swamped with the more core features. To answer your questions: 1. Neither. We welcome changes from developers for all components of Spark, including the EC2 scripts. Once

thoughts on spark_ec2.py?

2014-04-25 Thread Art Peel
I've been setting up Spark cluster on EC2 using the provided ec2/spark_ec2.py script and am very happy I didn't have to write it from scratch. Thanks for providing it. There have been some issues, though, and I have had to make some additions. So far, they are all additions of command-line option

Re: Problem creating objects through reflection

2014-04-25 Thread Piotr Kołaczkowski
Yeah, this is related. From https://groups.google.com/forum/#!msg/spark-users/bwAmbUgxWrA/HwP4Nv4adfEJ: "This is a limitation that will hopefully go away in Scala 2.10 or 2.10 .1, when we'll use macros to remove the need to do this. (Or more generally if we get some changes in the Scala interprete