Hi, I'm new to spark and just trying to make sense of the SVMWithSGD
example.

I ran my dataset through it and build a model.  When I call predict() on
the testing data (after clearThreshold()) I was expecting to get answers in
the range of 0 to 1.  But they aren't, all predictions seem to be negative
numbers between -0 and -2.  I guess my question is what do these
predictions mean?  How are they of use?

The outcome I need is a probability rather than a binary.

Here's my java code:

        SparkConf conf = new SparkConf()
                .setAppName("name")
                .set("spark.cores.max", "1");
        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<LabeledPoint> points = sc.textFile(path).map(new
ParsePoint()).cache();

        JavaRDD<LabeledPoint> training = points.sample(false, 0.8,
0L).cache();

        JavaRDD<LabeledPoint> testing = points.subtract(training);

        SVMModel model = SVMWithSGD.train(training.rdd(), 100);

        model.clearThreshold();

        for (LabeledPoint point : testing.toArray()) {
            Double score = model.predict(point.features());

            System.out.println("score = " + score);//<- all these are
negative numbers, seemingly between 0 and -2
        }

Reply via email to