I am new in MLlib and in Spark.(I use Scala) I'm trying to understand how LogisticRegressionWithLBFGS and LogisticRegressionWithSGD work. I usually use R to do logistic regressions but now I do it on Spark to be able to analyze Big Data.
The model only returns weights and intercept. My problem is that I have no information about which variable is significant and which variable I had better to delete to improve my model. I only have the confusion matrix and the AUC to evaluate the performance. Is there any way to have information about the variables I put in my model? How can I try different variable combinations, do I have to modify the dataset of origin (e.g. delete one or several columns?) How are the weights calculated: is there a correlation calculation with the variable of interest? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-how-to-get-the-best-model-with-only-the-most-significant-explanatory-variables-in-LogisticRegr-tp22993.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org