The model is learned using an iterative convex optimization algorithm.
 "numIterations," "stepSize" and "miniBatchFraction" are for those; you can
see details here:
http://spark.apache.org/docs/latest/mllib-linear-methods.html#implementation-developer
http://spark.apache.org/docs/latest/mllib-optimization.html

I would set miniBatchFraction at 1.0 and not mess with it.
For LogisticRegressionWithSGD, to know whether you have the other 2
parameters set correctly, you should try running with more iterations.
If running with more iterations changes your result significantly, then:
 - If the result is blowing up (really big model weights), then you need to
decrease stepSize.
 - If the result is not blowing up but keeps changing, then you need to
increase numIterations.

You should not need to set initialWeights, but it can help if you have some
estimate already calculated.

If you have access to a build of the current Spark master (or can wait for
1.4), then the org.apache.spark.ml.classification.LogisticRegression
implementation has been compared with R and should get very similar results.

Good luck!
Joseph

On Wed, May 27, 2015 at 8:22 AM, SparknewUser <melanie.galloi...@gmail.com>
wrote:

> I'm new to Spark and I'm getting bad performance with classification
> methods
> on Spark MLlib (worse than R in terms of AUC).
> I am trying to put my own parameters rather than the default parameters.
> Here is the method I want to use :
> train(RDD<LabeledPoint> input,
>             int numIterations,
>               double stepSize,
>      double miniBatchFraction,
>         Vector initialWeights)
> How to choose "numIterations" and "stepSize"?
> What does miniBatchFraction mean?
> Is initialWeights necessary to have a good model? Then, how to choose them?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-best-performance-with-LogisticRegressionWithSGD-tp23053.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to