Re: Problem in running MLlib SVM

Joseph Bradley Mon, 30 Nov 2015 16:33:58 -0800

model.predict should return a 0/1 predicted label.  The example code is
misleading when it calls the prediction a "score."


On Mon, Nov 30, 2015 at 9:13 AM, Fazlan Nazeem <fazl...@wso2.com> wrote:

> You should never use the training data to measure your prediction
> accuracy. Always use a fresh dataset (test data) for this purpose.
>
> On Sun, Nov 29, 2015 at 8:36 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> I think this should represent the label of LabledPoint (0 means negative
>> 1 means positive)
>> http://spark.apache.org/docs/latest/mllib-data-types.html#labeled-point
>>
>> The document you mention is for the mathematical formula, not the
>> implementation.
>>
>> On Sun, Nov 29, 2015 at 9:13 AM, Tarek Elgamal <tarek.elga...@gmail.com>
>> wrote:
>>
>>> According to the documentation
>>> <http://spark.apache.org/docs/latest/mllib-linear-methods.html>, by
>>> default, if wTx≥0 then the outcome is positive, and negative otherwise. I
>>> suppose that wTx is the "score" in my case. If score is more than 0 and the
>>> label is positive, then I return 1 which is correct classification and I
>>> return zero otherwise. Do you have any idea how to classify a point as
>>> positive or negative using this score or another function ?
>>>
>>> On Sat, Nov 28, 2015 at 5:14 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>>         if((score >=0 && label == 1) || (score <0 && label == 0))
>>>>              {
>>>>               return 1; //correct classiciation
>>>>              }
>>>>              else
>>>>               return 0;
>>>>
>>>>
>>>>
>>>> I suspect score is always between 0 and 1
>>>>
>>>>
>>>>
>>>> On Sat, Nov 28, 2015 at 10:39 AM, Tarek Elgamal <
>>>> tarek.elga...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to run the straightforward example of SVm but I am getting
>>>>> low accuracy (around 50%) when I predict using the same data I used for
>>>>> training. I am probably doing the prediction in a wrong way. My code is
>>>>> below. I would appreciate any help.
>>>>>
>>>>>
>>>>> import java.util.List;
>>>>>
>>>>> import org.apache.spark.SparkConf;
>>>>> import org.apache.spark.SparkContext;
>>>>> import org.apache.spark.api.java.JavaRDD;
>>>>> import org.apache.spark.api.java.function.Function;
>>>>> import org.apache.spark.api.java.function.Function2;
>>>>> import org.apache.spark.mllib.classification.SVMModel;
>>>>> import org.apache.spark.mllib.classification.SVMWithSGD;
>>>>> import org.apache.spark.mllib.regression.LabeledPoint;
>>>>> import org.apache.spark.mllib.util.MLUtils;
>>>>>
>>>>> import scala.Tuple2;
>>>>> import edu.illinois.biglbjava.readers.LabeledPointReader;
>>>>>
>>>>> public class SimpleDistSVM {
>>>>>   public static void main(String[] args) {
>>>>>     SparkConf conf = new SparkConf().setAppName("SVM Classifier
>>>>> Example");
>>>>>     SparkContext sc = new SparkContext(conf);
>>>>>     String inputPath=args[0];
>>>>>
>>>>>     // Read training data
>>>>>     JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc,
>>>>> inputPath).toJavaRDD();
>>>>>
>>>>>     // Run training algorithm to build the model.
>>>>>     int numIterations = 3;
>>>>>     final SVMModel model = SVMWithSGD.train(data.rdd(), numIterations);
>>>>>
>>>>>     // Clear the default threshold.
>>>>>     model.clearThreshold();
>>>>>
>>>>>
>>>>>     // Predict points in test set and map to an RDD of 0/1 values
>>>>> where 0 is misclassication and 1 is correct classification
>>>>>     JavaRDD<Integer> classification = data.map(new
>>>>> Function<LabeledPoint, Integer>() {
>>>>>          public Integer call(LabeledPoint p) {
>>>>>            int label = (int) p.label();
>>>>>            Double score = model.predict(p.features());
>>>>>            if((score >=0 && label == 1) || (score <0 && label == 0))
>>>>>            {
>>>>>            return 1; //correct classiciation
>>>>>            }
>>>>>            else
>>>>>             return 0;
>>>>>
>>>>>          }
>>>>>        }
>>>>>      );
>>>>>     // sum up all values in the rdd to get the number of correctly
>>>>> classified examples
>>>>>      int sum=classification.reduce(new Function2<Integer, Integer,
>>>>> Integer>()
>>>>>     {
>>>>>     public Integer call(Integer arg0, Integer arg1)
>>>>>     throws Exception {
>>>>>     return arg0+arg1;
>>>>>     }});
>>>>>
>>>>>      //compute accuracy as the percentage of the correctly classified
>>>>> examples
>>>>>      double accuracy=((double)sum)/((double)classification.count());
>>>>>      System.out.println("Accuracy = " + accuracy);
>>>>>
>>>>>         }
>>>>>       }
>>>>>     );
>>>>>   }
>>>>> }
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Thanks & Regards,
>
> Fazlan Nazeem
>
> *Software Engineer*
>
> *WSO2 Inc*
> Mobile : +94772338839
> <%2B94%20%280%29%20773%20451194>
> fazl...@wso2.com
>

Re: Problem in running MLlib SVM

Reply via email to