It looks like you are training each model i (for label i) by only using data with label i. You need to use all of your data to train each model so the models can compare each label i with the other labels (roughly speaking).
However, what you're doing is multiclass (not multilabel) classification, which LogisticRegressionWithLBFGS already supports. Can you not just use LogisticRegressionWithLBFGS directly? On Wed, May 27, 2015 at 8:53 AM, peterg <[email protected]> wrote: > Hi all > > I believe I have created a multi-label classifier using LogisticRegression > but there is one snag. No matter what features I use to get the prediction, > it will always return the label. I feel like I need to set a threshold but > can't seem to figure out how to do that. I attached the code below. It's > super simple. Hopefully someone can point me in the correct : > > val labels = labeledPoints.map(l => l.label).take(1000).distinct // stupid > hack > val groupedRDDs = labels.map { l => labeledPoints.filter (m => m.label == > l) > }.map(l => l.cache()) // should use groupBy > val models = groupedRDDs.map(rdd => new > LogisticRegressionWithLBFGS().setNumClasses(101).run(rdd)) > val results = models.map(m => m.predict(Vectors.dense(query.features))) > > Thanks > > Peter > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Multilabel-classification-using-logistic-regression-tp23054.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
