MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

SparknewUser Fri, 22 May 2015 01:19:36 -0700

I am new in MLlib and in Spark.(I use Scala)

I'm trying to understand how LogisticRegressionWithLBFGS and
LogisticRegressionWithSGD work.
I usually use R to do logistic regressions but now I do it on Spark
to be able to analyze Big Data.


The model only returns weights and intercept. My problem is that I have no
information about which variable is significant and which variable I had
better
to delete to improve my model. I only have the confusion matrix and the AUC
to evaluate the performance.

Is there any way to have information about the variables I put in my model?
How can I try different variable combinations, do I have to modify the
dataset
of origin (e.g. delete one or several columns?)
How are the weights calculated: is there a correlation calculation with the
variable
of interest?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-how-to-get-the-best-model-with-only-the-most-significant-explanatory-variables-in-LogisticRegr-tp22993.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

Reply via email to