I'm new to Spark and I'm getting bad performance with classification methods on Spark MLlib (worse than R in terms of AUC). I am trying to put my own parameters rather than the default parameters. Here is the method I want to use :
train(RDD <https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/rdd/RDD.html><LabeledPoint <https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/mllib/regression/LabeledPoint.html>> input, int numIterations, double stepSize, double miniBatchFraction, Vector <https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/mllib/linalg/Vector.html> initialWeights) How to choose "numIterations" and "stepSize"? What does miniBatchFraction mean? Is initialWeights necessary to have a good model? Then, how to choose them? Regards, Mélanie Gallois