Hi everyone,

The Python LogisticRegressionWithSGD does not appear to estimate an
intercept.  When I run the following, the returned weights and intercept
are both 0.0:

from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.classification import LogisticRegressionWithSGD

def main():
    sc = SparkContext(appName="NoIntercept")

    train = sc.parallelize([LabeledPoint(0, [0]), LabeledPoint(1, [0]),
LabeledPoint(1, [0])])

    model = LogisticRegressionWithSGD.train(train, iterations=500, step=0.1)
    print "Final weights: " + str(model.weights)
    print "Final intercept: " + str(model.intercept)

if __name__ == "__main__":
    main()


Of course, one can fit an intercept with the simple expedient of adding a
column of ones, but that's kind of annoying.  Moreover, it looks like the
scala version has an intercept option.

Am I missing something? Should I just add the column of ones? If I
submitted a PR doing that, is that the sort of thing you guys would accept?

Thanks! :-)

Naftali

Reply via email to