Hi, I'm new to spark and just trying to make sense of the SVMWithSGD example.
I ran my dataset through it and build a model. When I call predict() on the testing data (after clearThreshold()) I was expecting to get answers in the range of 0 to 1. But they aren't, all predictions seem to be negative numbers between -0 and -2. I guess my question is what do these predictions mean? How are they of use? The outcome I need is a probability rather than a binary. Here's my java code: SparkConf conf = new SparkConf() .setAppName("name") .set("spark.cores.max", "1"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<LabeledPoint> points = sc.textFile(path).map(new ParsePoint()).cache(); JavaRDD<LabeledPoint> training = points.sample(false, 0.8, 0L).cache(); JavaRDD<LabeledPoint> testing = points.subtract(training); SVMModel model = SVMWithSGD.train(training.rdd(), 100); model.clearThreshold(); for (LabeledPoint point : testing.toArray()) { Double score = model.predict(point.features()); System.out.println("score = " + score);//<- all these are negative numbers, seemingly between 0 and -2 }