Re: return probability \ confidence instead of actual class

Adamantios Corais Tue, 07 Oct 2014 00:09:18 -0700

Well, apparently, the above Python set-up is wrong. Please consider the
following set-up which DOES use 'linear' kernel... And the question remains
the same: how to interpret Spark results (or why Spark results are NOT
bounded between -1 and 1)?


On Mon, Oct 6, 2014 at 8:35 PM, Sunny Khatri <sunny.k...@gmail.com> wrote:

> One diff I can find is you may have different kernel functions for your
> training, In Spark, you end up using Linear Kernel whereas for scikit you
> are using rbk kernel. That can explain the different in the coefficients
> you are getting.
>
> On Mon, Oct 6, 2014 at 10:15 AM, Adamantios Corais <
> adamantios.cor...@gmail.com> wrote:
>
>> Hi again,
>>
>> Finally, I found the time to play around with your suggestions.
>> Unfortunately, I noticed some unusual behavior in the MLlib results, which
>> is more obvious when I compare them against their scikit-learn equivalent.
>> Note that I am currently using spark 0.9.2. Long story short: I find it
>> difficult to interpret the result: scikit-learn SVM always returns a value
>> between 0 and 1 which makes it easy for me to set-up a threshold in order
>> to keep only the most significant classifications (this is the case for
>> both short and long input vectors). On the other hand, Spark MLlib makes it
>> impossible to interpret the results; results are hardly ever bounded
>> between -1 and +1 and hence it is impossible to choose a good cut-off value
>> - results are of no practical use. And here is the strangest thing ever:
>> although - it seems that - MLlib does NOT generate the right weights and
>> intercept, when I feed the MLlib with the weights and intercept from
>> scikit-learn the results become pretty accurate!!!! Any ideas about what is
>> happening? Any suggestion is highly appreciated.
>>
>> PS: to make thinks easier I have quoted both of my implantations as well
>> as results, bellow.
>>
>> //////////////////////////////////////////////////
>>
>> SPARK (short input):
>> training_error: Double = 0.0
>> res2: Array[Double] = Array(-1.4420684459128205E-19,
>> -1.4420684459128205E-19, -1.4420684459128205E-19, 0.3749999999999999,
>> 0.7499999999999998, 0.7499999999999998, 0.7499999999999998)
>>
>> SPARK (long input):
>> training_error: Double = 0.0
>> res2: Array[Double] = Array(-0.782207630902241, -0.782207630902241,
>> -0.782207630902241, 0.9522394329769612, 2.6866864968561632,
>> 2.6866864968561632, 2.6866864968561632)
>>
>> PYTHON (short input):
>> array([[-1.00000001],
>>        [-1.00000001],
>>        [-1.00000001],
>>        [-0.        ],
>>        [ 1.00000001],
>>        [ 1.00000001],
>>        [ 1.00000001]])
>>
>> PYTHON (long input):
>> array([[-1.00000001],
>>        [-1.00000001],
>>        [-1.00000001],
>>        [-0.        ],
>>        [ 1.00000001],
>>        [ 1.00000001],
>>        [ 1.00000001]])
>>
>> //////////////////////////////////////////////////
>>
>> import analytics.MSC
>>
>> import java.util.Calendar
>> import java.text.SimpleDateFormat
>> import scala.collection.mutable
>> import scala.collection.JavaConversions._
>> import org.apache.spark.SparkContext._
>> import org.apache.spark.mllib.classification.SVMWithSGD
>> import org.apache.spark.mllib.regression.LabeledPoint
>> import org.apache.spark.mllib.optimization.L1Updater
>> import com.datastax.bdp.spark.connector.CassandraConnector
>> import com.datastax.bdp.spark.SparkContextCassandraFunctions._
>>
>> val sc = MSC.sc
>> val lg = MSC.logger
>>
>> //val s_users_double_2 = Seq(
>> //  (0.0,Seq(0.0, 0.0, 0.0)),
>> //  (0.0,Seq(0.0, 0.0, 0.0)),
>> //  (0.0,Seq(0.0, 0.0, 0.0)),
>> //  (1.0,Seq(1.0, 1.0, 1.0)),
>> //  (1.0,Seq(1.0, 1.0, 1.0)),
>> //  (1.0,Seq(1.0, 1.0, 1.0))
>> //)
>> val s_users_double_2 = Seq(
>>     (0.0,Seq(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0)),
>>     (0.0,Seq(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0)),
>>     (0.0,Seq(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0)),
>>     (1.0,Seq(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0)),
>>     (1.0,Seq(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0)),
>>     (1.0,Seq(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0))
>> )
>> val s_users_double = sc.parallelize(s_users_double_2)
>>
>> val s_users_parsed = s_users_double.map{line=>
>>   LabeledPoint(line._1, line._2.toArray)
>> }.cache()
>>
>> val iterations = 100
>>
>> val model = SVMWithSGD.train(s_users_parsed, iterations)
>>
>> val predictions1 = s_users_parsed.map{point=>
>>   (point.label, model.predict(point.features))
>> }.cache()
>>
>> val training_error = predictions1.filter(r=> r._1 !=
>> r._2).count().toDouble / s_users_parsed.count()
>>
>> val TP = predictions1.map(s=> if (s._1==1.0 && s._2==1.0) true else
>> false).filter(t=> t).count()
>> val FP = predictions1.map(s=> if (s._1==0.0 && s._2==1.0) true else
>> false).filter(t=> t).count()
>> val TN = predictions1.map(s=> if (s._1==0.0 && s._2==0.0) true else
>> false).filter(t=> t).count()
>> val FN = predictions1.map(s=> if (s._1==1.0 && s._2==0.0) true else
>> false).filter(t=> t).count()
>>
>> val weights = model.weights
>>
>> val intercept = model.intercept
>>
>> //val m_users_double_2 = Seq(
>> //  Seq(0.0, 0.0, 0.0),
>> //  Seq(0.0, 0.0, 0.0),
>> //  Seq(0.0, 0.0, 0.0),
>> //  Seq(0.5, 0.5, 0.5),
>> //  Seq(1.0, 1.0, 1.0),
>> //  Seq(1.0, 1.0, 1.0),
>> //  Seq(1.0, 1.0, 1.0)
>> //)
>> val m_users_double_2 = Seq(
>>     Seq(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0),
>>     Seq(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0),
>>     Seq(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0),
>>       Seq(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
>> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
>> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 2.0, 0.5, 0.5, 0.5),
>>     Seq(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0),
>>     Seq(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0),
>>     Seq(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0)
>> )
>> val m_users_double = sc.parallelize(m_users_double_2)
>>
>> val predictions2 = m_users_double.map{point=>
>>   point.zip(weights).map(a=> a._1 * a._2).sum + intercept
>> }.cache()
>>
>> predictions2.collect()
>>
>> //////////////////////////////////////////////////
>>
>> from sklearn import svm
>>
>> flag = 'short' # 'long'
>>
>> if flag == 'long':
>>     X = [
>>         [0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0],
>>         [1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0]
>>     ]
>>     Y = [
>>         0.0,
>>         0.0,
>>         0.0,
>>         1.0,
>>         1.0,
>>         1.0
>>     ]
>>     T = [
>>         [0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0],
>>         [0.5, 0.5, 0.5],
>>         [1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0]
>>     ]
>>
>> if flag == 'long':
>>     X = [
>>         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0],
>>         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0]
>>     ]
>>     Y = [
>>         0.0,
>>         0.0,
>>         0.0,
>>         1.0,
>>         1.0,
>>         1.0
>>     ]
>>     T = [
>>         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0],
>>         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
>> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0],
>>         [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
>> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
>> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 2.0, 0.5, 0.5, 0.5],
>>         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0],
>>         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0]
>>     ]
>>
>> clf = svm.SVC()
>> clf.fit(X, Y)
>> svm.SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
>> gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
>> shrinking=True, tol=0.001, verbose=False)
>> clf.decision_function(T)
>>
>> ///////////////////////////////////////////////////
>>
>>
>>
>>
>> On Thu, Sep 25, 2014 at 2:25 AM, Sunny Khatri <sunny.k...@gmail.com>
>> wrote:
>>
>>> For multi-class you can use the same SVMWithSGD (for binary
>>> classification) with One-vs-All approach constructing respective training
>>> corpuses consisting one Class i as positive samples and Rest of the classes
>>> as negative one, and then use the same method provided by Aris as a measure
>>> of how far Class i is from the decision boundary.
>>>
>>> On Wed, Sep 24, 2014 at 4:06 PM, Aris <arisofala...@gmail.com> wrote:
>>>
>>>> Χαίρε Αδαμάντιε Κοραή....έαν είναι πράγματι το όνομα σου..
>>>>
>>>> Just to follow up on Liquan, you might be interested in removing the
>>>> thresholds, and then treating the predictions as a probability from 0..1
>>>> inclusive. SVM with the linear kernel is a straightforward linear
>>>> classifier -- so you with the model.clearThreshold() you can just get the
>>>> raw predicted scores, removing the threshold which simple translates that
>>>> into a positive/negative class.
>>>>
>>>> API is here
>>>> http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel
>>>>
>>>> Enjoy!
>>>> Aris
>>>>
>>>> On Sun, Sep 21, 2014 at 11:50 PM, Liquan Pei <liquan...@gmail.com>
>>>> wrote:
>>>>
>>>>> HI Adamantios,
>>>>>
>>>>> For your first question, after you train the SVM, you get a model with
>>>>> a vector of weights w and an intercept b, point x such that  w.dot(x) + b 
>>>>> =
>>>>> 1 and w.dot(x) + b = -1 are points that on the decision boundary. The
>>>>> quantity w.dot(x) + b for point x is a confidence measure of
>>>>> classification.
>>>>>
>>>>> Code wise, suppose you trained your model via
>>>>> val model = SVMWithSGD.train(...)
>>>>>
>>>>> and you can set a threshold by calling
>>>>>
>>>>> model.setThreshold(your threshold here)
>>>>>
>>>>> to set the threshold that separate positive predictions from negative
>>>>> predictions.
>>>>>
>>>>> For more info, please take a look at
>>>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel
>>>>>
>>>>> For your second question, SVMWithSGD only supports binary
>>>>> classification.
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Liquan
>>>>>
>>>>> On Sun, Sep 21, 2014 at 11:22 PM, Adamantios Corais <
>>>>> adamantios.cor...@gmail.com> wrote:
>>>>>
>>>>>> Nobody?
>>>>>>
>>>>>> If that's not supported already, can please, at least, give me a few
>>>>>> hints on how to implement it?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais <
>>>>>> adamantios.cor...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am working with the SVMWithSGD classification algorithm on Spark.
>>>>>>> It works fine for me, however, I would like to recognize the instances 
>>>>>>> that
>>>>>>> are classified with a high confidence from those with a low one. How do 
>>>>>>> we
>>>>>>> define the threshold here? Ultimately, I want to keep only those for 
>>>>>>> which
>>>>>>> the algorithm is very *very* certain about its its decision! How to do
>>>>>>> that? Is this feature supported already by any MLlib algorithm? What if 
>>>>>>> I
>>>>>>> had multiple categories?
>>>>>>>
>>>>>>> Any input is highly appreciated!
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Liquan Pei
>>>>> Department of Physics
>>>>> University of Massachusetts Amherst
>>>>>
>>>>
>>>>
>>>
>>
>

Re: return probability \ confidence instead of actual class

Reply via email to