model.predict should return a 0/1 predicted label. The example code is misleading when it calls the prediction a "score."
On Mon, Nov 30, 2015 at 9:13 AM, Fazlan Nazeem <fazl...@wso2.com> wrote: > You should never use the training data to measure your prediction > accuracy. Always use a fresh dataset (test data) for this purpose. > > On Sun, Nov 29, 2015 at 8:36 AM, Jeff Zhang <zjf...@gmail.com> wrote: > >> I think this should represent the label of LabledPoint (0 means negative >> 1 means positive) >> http://spark.apache.org/docs/latest/mllib-data-types.html#labeled-point >> >> The document you mention is for the mathematical formula, not the >> implementation. >> >> On Sun, Nov 29, 2015 at 9:13 AM, Tarek Elgamal <tarek.elga...@gmail.com> >> wrote: >> >>> According to the documentation >>> <http://spark.apache.org/docs/latest/mllib-linear-methods.html>, by >>> default, if wTx≥0 then the outcome is positive, and negative otherwise. I >>> suppose that wTx is the "score" in my case. If score is more than 0 and the >>> label is positive, then I return 1 which is correct classification and I >>> return zero otherwise. Do you have any idea how to classify a point as >>> positive or negative using this score or another function ? >>> >>> On Sat, Nov 28, 2015 at 5:14 AM, Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> if((score >=0 && label == 1) || (score <0 && label == 0)) >>>> { >>>> return 1; //correct classiciation >>>> } >>>> else >>>> return 0; >>>> >>>> >>>> >>>> I suspect score is always between 0 and 1 >>>> >>>> >>>> >>>> On Sat, Nov 28, 2015 at 10:39 AM, Tarek Elgamal < >>>> tarek.elga...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to run the straightforward example of SVm but I am getting >>>>> low accuracy (around 50%) when I predict using the same data I used for >>>>> training. I am probably doing the prediction in a wrong way. My code is >>>>> below. I would appreciate any help. >>>>> >>>>> >>>>> import java.util.List; >>>>> >>>>> import org.apache.spark.SparkConf; >>>>> import org.apache.spark.SparkContext; >>>>> import org.apache.spark.api.java.JavaRDD; >>>>> import org.apache.spark.api.java.function.Function; >>>>> import org.apache.spark.api.java.function.Function2; >>>>> import org.apache.spark.mllib.classification.SVMModel; >>>>> import org.apache.spark.mllib.classification.SVMWithSGD; >>>>> import org.apache.spark.mllib.regression.LabeledPoint; >>>>> import org.apache.spark.mllib.util.MLUtils; >>>>> >>>>> import scala.Tuple2; >>>>> import edu.illinois.biglbjava.readers.LabeledPointReader; >>>>> >>>>> public class SimpleDistSVM { >>>>> public static void main(String[] args) { >>>>> SparkConf conf = new SparkConf().setAppName("SVM Classifier >>>>> Example"); >>>>> SparkContext sc = new SparkContext(conf); >>>>> String inputPath=args[0]; >>>>> >>>>> // Read training data >>>>> JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, >>>>> inputPath).toJavaRDD(); >>>>> >>>>> // Run training algorithm to build the model. >>>>> int numIterations = 3; >>>>> final SVMModel model = SVMWithSGD.train(data.rdd(), numIterations); >>>>> >>>>> // Clear the default threshold. >>>>> model.clearThreshold(); >>>>> >>>>> >>>>> // Predict points in test set and map to an RDD of 0/1 values >>>>> where 0 is misclassication and 1 is correct classification >>>>> JavaRDD<Integer> classification = data.map(new >>>>> Function<LabeledPoint, Integer>() { >>>>> public Integer call(LabeledPoint p) { >>>>> int label = (int) p.label(); >>>>> Double score = model.predict(p.features()); >>>>> if((score >=0 && label == 1) || (score <0 && label == 0)) >>>>> { >>>>> return 1; //correct classiciation >>>>> } >>>>> else >>>>> return 0; >>>>> >>>>> } >>>>> } >>>>> ); >>>>> // sum up all values in the rdd to get the number of correctly >>>>> classified examples >>>>> int sum=classification.reduce(new Function2<Integer, Integer, >>>>> Integer>() >>>>> { >>>>> public Integer call(Integer arg0, Integer arg1) >>>>> throws Exception { >>>>> return arg0+arg1; >>>>> }}); >>>>> >>>>> //compute accuracy as the percentage of the correctly classified >>>>> examples >>>>> double accuracy=((double)sum)/((double)classification.count()); >>>>> System.out.println("Accuracy = " + accuracy); >>>>> >>>>> } >>>>> } >>>>> ); >>>>> } >>>>> } >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > Thanks & Regards, > > Fazlan Nazeem > > *Software Engineer* > > *WSO2 Inc* > Mobile : +94772338839 > <%2B94%20%280%29%20773%20451194> > fazl...@wso2.com >