HI Adamantios, For your first question, after you train the SVM, you get a model with a vector of weights w and an intercept b, point x such that w.dot(x) + b = 1 and w.dot(x) + b = -1 are points that on the decision boundary. The quantity w.dot(x) + b for point x is a confidence measure of classification.
Code wise, suppose you trained your model via val model = SVMWithSGD.train(...) and you can set a threshold by calling model.setThreshold(your threshold here) to set the threshold that separate positive predictions from negative predictions. For more info, please take a look at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel For your second question, SVMWithSGD only supports binary classification. Hope this helps, Liquan On Sun, Sep 21, 2014 at 11:22 PM, Adamantios Corais < adamantios.cor...@gmail.com> wrote: > Nobody? > > If that's not supported already, can please, at least, give me a few hints > on how to implement it? > > Thanks! > > > On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais < > adamantios.cor...@gmail.com> wrote: > >> Hi, >> >> I am working with the SVMWithSGD classification algorithm on Spark. It >> works fine for me, however, I would like to recognize the instances that >> are classified with a high confidence from those with a low one. How do we >> define the threshold here? Ultimately, I want to keep only those for which >> the algorithm is very *very* certain about its its decision! How to do >> that? Is this feature supported already by any MLlib algorithm? What if I >> had multiple categories? >> >> Any input is highly appreciated! >> > > -- Liquan Pei Department of Physics University of Massachusetts Amherst