I'm new to Spark and I'm getting bad performance with classification
methods on Spark MLlib (worse than R in terms of AUC).
I am trying to put my own parameters rather than the default parameters.
Here is the method I want to use :

train(RDD 
<https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/rdd/RDD.html><LabeledPoint
<https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/mllib/regression/LabeledPoint.html>>
input,
            int numIterations,
              double stepSize,
     double miniBatchFraction,
        Vector 
<https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/mllib/linalg/Vector.html>
initialWeights)

How to choose "numIterations" and "stepSize"?
What does miniBatchFraction mean?
Is initialWeights necessary to have a good model? Then, how to choose them?


Regards,

Mélanie Gallois

Reply via email to