Thanks for the info. On 19 October 2014 20:46, Sean Owen <so...@cloudera.com> wrote:
> Ah right. It is important to use clearThreshold() in that example in > order to generate margins, because the AUC metric needs the > classifications to be ranked by some relative strength, rather than > just 0/1. These outputs are not probabilities, and that is not what > SVMs give you in general. There are techniques for estimating > probabilities from SVM output but these aren't present here. > > If you just want 0/1, you do not want to call clearThreshold(). > > Linear regression is not a classifier so probabilities don't enter > into it. Logistic regression however does give you a probability if > you compute the logistic function of the input directly. > > On Sun, Oct 19, 2014 at 3:00 PM, Nick Pomfret > <nick-nab...@snowmonkey.co.uk> wrote: > > Thanks. > > > > The example I used is here > > https://spark.apache.org/docs/latest/mllib-linear-methods.html see > > SVMClassifier > > > > So there's no way to get a probability based output? What about from > linear > > regression, or logistic regression? > > > > On 19 October 2014 19:52, Sean Owen <so...@cloudera.com> wrote: > >> > >> The problem is that you called clearThreshold(). The result becomes the > >> SVM margin not a 0/1 class prediction. There is no probability output. > >> > >> There was a very similar question last week. Is there an example out > there > >> suggesting clearThreshold()? I also wonder if it is good to overload the > >> meaning of the output indirectly this way. > >> > >> On Oct 19, 2014 6:53 PM, "npomfret" <nick-nab...@snowmonkey.co.uk> > wrote: > >>> > >>> Hi, I'm new to spark and just trying to make sense of the SVMWithSGD > >>> example. I ran my dataset through it and build a model. When I call > >>> predict() on the testing data (after clearThreshold()) I was expecting > to > >>> get answers in the range of 0 to 1. But they aren't, all predictions > seem to > >>> be negative numbers between -0 and -2. I guess my question is what do > these > >>> predictions mean? How are they of use? The outcome I need is a > probability > >>> rather than a binary. Here's my java code: SparkConf conf = new > SparkConf() > >>> .setAppName("name") .set("spark.cores.max", "1"); JavaSparkContext sc > = new > >>> JavaSparkContext(conf); JavaRDD points = sc.textFile(path).map(new > >>> ParsePoint()).cache(); JavaRDD training = points.sample(false, 0.8, > >>> 0L).cache(); JavaRDD testing = points.subtract(training); SVMModel > model = > >>> SVMWithSGD.train(training.rdd(), 100); model.clearThreshold(); for > >>> (LabeledPoint point : testing.toArray()) { Double score = > >>> model.predict(point.features()); System.out.println("score = " + > score);//<- > >>> all these are negative numbers, seemingly between 0 and -2 } > >>> ________________________________ > >>> View this message in context: Using SVMWithSGD model to predict > >>> Sent from the Apache Spark User List mailing list archive at > Nabble.com. > > > > >