Hi everyone, The Python LogisticRegressionWithSGD does not appear to estimate an intercept. When I run the following, the returned weights and intercept are both 0.0:
from pyspark import SparkContext from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.classification import LogisticRegressionWithSGD def main(): sc = SparkContext(appName="NoIntercept") train = sc.parallelize([LabeledPoint(0, [0]), LabeledPoint(1, [0]), LabeledPoint(1, [0])]) model = LogisticRegressionWithSGD.train(train, iterations=500, step=0.1) print "Final weights: " + str(model.weights) print "Final intercept: " + str(model.intercept) if __name__ == "__main__": main() Of course, one can fit an intercept with the simple expedient of adding a column of ones, but that's kind of annoying. Moreover, it looks like the scala version has an intercept option. Am I missing something? Should I just add the column of ones? If I submitted a PR doing that, is that the sort of thing you guys would accept? Thanks! :-) Naftali