I don't know about Spark's implementation, but with LBFGS, there is a line
search step. Since computing the line search takes roughly the same work
as one iteration, an efficient implementation will take a full step and
simultaneously compute the gradient for the next step and check if the
update
LBFGS will not take a step that sends the objective value up. It might try
a step that is "too big" and reject it, so if you're just logging
everything that gets tried by LBFGS, you could see that. The "iterations"
method of the minimizer should never return an increasing objective value.
If you're
Another interesting benchmark.
*News20 dataset - 0.14M row, 1,355,191 features, 0.034% non-zero elements.*
LBFGS converges in 70 seconds, while GD seems to be not progressing.
Dense feature vector will be too big to fit in the memory, so only conduct
the sparse benchmark.
I saw the sometimes th
Hi Art,
First of all thanks a lot for your PRs. We are currently in the middle of
all the Spark 1.0 release so most of us are swamped with the more core
features. To answer your questions:
1. Neither. We welcome changes from developers for all components of Spark,
including the EC2 scripts. Once
I've been setting up Spark cluster on EC2 using the provided
ec2/spark_ec2.py script and am very happy I didn't have to write it from
scratch. Thanks for providing it.
There have been some issues, though, and I have had to make some additions.
So far, they are all additions of command-line option
Yeah, this is related.
From
https://groups.google.com/forum/#!msg/spark-users/bwAmbUgxWrA/HwP4Nv4adfEJ:
"This is a limitation that will hopefully go away in Scala 2.10 or 2.10 .1,
when we'll use macros to remove the need to do this. (Or more generally if
we get some changes in the Scala interprete