How to get the best performance with LogisticRegressionWithSGD?

mélanie gallois Wed, 27 May 2015 08:20:24 -0700

I'm new to Spark and I'm getting bad performance with classification
methods on Spark MLlib (worse than R in terms of AUC).
I am trying to put my own parameters rather than the default parameters.
Here is the method I want to use :


train(RDD 
<https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/rdd/RDD.html><LabeledPoint
<https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/mllib/regression/LabeledPoint.html>>
input,
            int numIterations,
              double stepSize,
     double miniBatchFraction,
        Vector 
<https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/mllib/linalg/Vector.html>
initialWeights)

How to choose "numIterations" and "stepSize"?
What does miniBatchFraction mean?
Is initialWeights necessary to have a good model? Then, how to choose them?


Regards,

Mélanie Gallois

How to get the best performance with LogisticRegressionWithSGD?

Reply via email to