Re: MLLib - Regularized logistic regression in python

Yanbo Liang Tue, 15 Jul 2014 21:12:07 -0700

1) AFAIK Spark Python API does not supply interface to set regType and
regParam.
If you want to personalize your own LR model with proper regularized
parameters, strong recommend to user scala API.
You can reference the following code at
spark-1.0.0/python/pyspark/mllib/classification.py.
class LogisticRegressionWithSGD(object):
     @classmethod
     def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0,
initialWeights=None):
         """Train a logistic regression model on the given data."""
         sc = data.context
         train_func = lambda d, i:
sc._jvm.PythonMLLibAPI().trainLogisticRegressionModelWithSGD(
             d._jrdd, iterations, step, miniBatchFraction, i)
         return _regression_train_wrapper(sc, train_func,
LogisticRegressionModel, data,
                                          initialWeights)


2) The actual probability has been computed but without output.
If you want to output the score, just customize the following function at class
LogisticRegressionModel.

override protected def predictPoint(dataMatrix: Vector, weightMatrix:
Vector,
      intercept: Double) = {
    val margin = weightMatrix.toBreeze.dot(dataMatrix.toBreeze) + intercept
    val score = 1.0/ (1.0 + math.exp(-margin))
    threshold match {
      case Some(t) => if (score < t) 0.0 else 1.0
      case None => score
    }
  }


2014-07-16 2:12 GMT+08:00 fjeg <francisco.gime...@gmail.com>:

> Hi All,
>
> I am trying to perform regularized logistic regression with mllib in
> python.
> I have seen that this is possible in the following scala example:
>
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala
>
> But I do not see any way to set the regType and regParam when training
> logistic regression through python.
>
> Additionally, I would like to output the activations -- i.e. P(Y=1 | X).
> Currently, LogisticRegressionModel.predict() just thresholds at 0.5 and
> does
> not return the actual probability. Do I just have to do this by hand by
> grabbing the weights from the trained model, or is there a built in way to
> do this?
>
> Best,
> Francisco Gimenez
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Regularized-logistic-regression-in-python-tp9780.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: MLLib - Regularized logistic regression in python

Reply via email to