1) AFAIK Spark Python API does not supply interface to set regType and regParam. If you want to personalize your own LR model with proper regularized parameters, strong recommend to user scala API. You can reference the following code at spark-1.0.0/python/pyspark/mllib/classification.py. class LogisticRegressionWithSGD(object): @classmethod def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, initialWeights=None): """Train a logistic regression model on the given data.""" sc = data.context train_func = lambda d, i: sc._jvm.PythonMLLibAPI().trainLogisticRegressionModelWithSGD( d._jrdd, iterations, step, miniBatchFraction, i) return _regression_train_wrapper(sc, train_func, LogisticRegressionModel, data, initialWeights)
2) The actual probability has been computed but without output. If you want to output the score, just customize the following function at class LogisticRegressionModel. override protected def predictPoint(dataMatrix: Vector, weightMatrix: Vector, intercept: Double) = { val margin = weightMatrix.toBreeze.dot(dataMatrix.toBreeze) + intercept val score = 1.0/ (1.0 + math.exp(-margin)) threshold match { case Some(t) => if (score < t) 0.0 else 1.0 case None => score } } 2014-07-16 2:12 GMT+08:00 fjeg <francisco.gime...@gmail.com>: > Hi All, > > I am trying to perform regularized logistic regression with mllib in > python. > I have seen that this is possible in the following scala example: > > https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala > > But I do not see any way to set the regType and regParam when training > logistic regression through python. > > Additionally, I would like to output the activations -- i.e. P(Y=1 | X). > Currently, LogisticRegressionModel.predict() just thresholds at 0.5 and > does > not return the actual probability. Do I just have to do this by hand by > grabbing the weights from the trained model, or is there a built in way to > do this? > > Best, > Francisco Gimenez > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Regularized-logistic-regression-in-python-tp9780.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >