Tarek,
On looking at the code in SVM.scala, I see that SVMWithSGD.predictPoint
first computes dot(w, x) + b where w is the SVM weight vector, x is the
input vector, and b is a constant. If there is a threshold defined, then the
output is 1 if that's greater than the threshold and 0 otherwise. If there
is no threshold, then it just returns dot(w, x) + b. There is no requirement
that the output be constrained to a specific range.
For a little problem I was working on, I investigated the outputs a little
bit; here's a snippet of some stuff you could put in spark-shell:
model.clearThreshold
val foo = x.map (p => (p.label, model.predict (p.features)))
import org.apache.spark.mllib.stat.Statistics
val summary = Statistics.colStats (foo.map {case (a, b) => Vectors.dense
(a, b)})
summary.mean
summary.min
summary.max
When I tried that, I found a very large range of outputs -- something like
-6*10^6 to -400, with a mean of about -30000. If you look into it, let us
know what you find, I would be interested to hear about it.
best,
Robert Dodier
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Problem-in-running-MLlib-SVM-tp15380p15416.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]