Spark MLlib provides a cross-validation toolkit for selecting hyperparameters. I think you'll find the documentation quite helpful:
http://spark.apache.org/docs/latest/ml-tuning.html#example-model-selection-via-cross-validation There is actually a python example for logistic regression there. If you still have questions after reading it, then please post back again. Hope that helps. On Thu, Oct 13, 2016 at 12:58 PM, aditya1702 <adityavya...@gmail.com> wrote: > Ok so I tried setting the regParam and tried lowering it. how do I evaluate > which regParam is best. Do I have to to do it by trial and error. I am > currently calculating the log_loss for the model. Is it good to find the > best regparam value. here is my code: > > from math import exp,log > #from pyspark.sql.functions import log > epsilon = 1e-16 > def sigmoid_log_loss(w,x): > ans=float(1/(1+exp(-(w.dot(x.features))))) > if ans==0: > ans=ans+epsilon > if ans==1: > ans=ans-epsilon > log_loss=-((x.label)*log(ans)+(1-x.label)*log(1-ans)) > return ((ans,x.label),log_loss) > > ------------------------------------------------------- > reg=0.02 > from pyspark.ml.classification import LogisticRegression > lr=LogisticRegression(regParam=reg,maxIter=500,standardization=True, > elasticNetParam=0.5) > model=lr.fit(data_train_df) > > w=model.coefficients > intercept=model.intercept > data_predicted_df=data_val_df.map(lambda x:(sigmoid_log_loss(w,x))) > log_loss=data_predicted_df.map(lambda x:x[1]).mean() > print log_loss > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression- > tp19432p19444.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >