Hi ioanna, I'd like to help look into it. Is there a way to access your training data?
2016-12-20 17:21 GMT-08:00 ioanna <giann...@hotmail.com>: > I have an issue with an SVM model trained for binary classification using > Spark 2.0.0. > I have followed the same logic using scikit-learn and MLlib, using the > exact > same dataset. > For scikit learn I have the following code: > > svc_model = SVC() > svc_model.fit(X_train, y_train) > > print "supposed to be 1" > print svc_model.predict([15 ,15,0,15,15,4,12,8,0,7]) > print > svc_model.predict([15.0,15.0,15.0,7.0,7.0,15.0,15.0,0.0,12.0,15.0]) > print svc_model.predict([15.0,15.0,7.0,0.0,7.0,0.0,15.0,15.0,15. > 0,15.0]) > print svc_model.predict([7.0,0.0,15.0,15.0,15.0,15.0,7.0,7.0,15.0, > 15.0]) > > print "supposed to be 0" > print svc_model.predict([18.0, 15.0, 7.0, 7.0, 15.0, 0.0, 15.0, 15.0, > 15.0, 15.0]) > print svc_model.predict([ 11.0,13.0,7.0,10.0,7.0,13.0,7. > 0,19.0,7.0,7.0]) > print svc_model.predict([ 15.0, 15.0, 18.0, 7.0, 15.0, 15.0, 15.0, > 18.0, > 7.0, 15.0]) > print svc_model.predict([ 15.0, 15.0, 8.0, 0.0, 0.0, 8.0, 15.0, 15.0, > 15.0, 7.0]) > > > and it returns: > > supposed to be 1 > [0] > [1] > [1] > [1] > supposed to be 0 > [0] > [0] > [0] > [0] > > For spark am doing: > > model_svm = SVMWithSGD.train(trainingData, iterations=100) > > model_svm.clearThreshold() > > print "supposed to be 1" > print > model_svm.predict(Vectors.dense(15.0,15.0,0.0,15.0,15.0, > 4.0,12.0,8.0,0.0,7.0)) > print > model_svm.predict(Vectors.dense(15.0,15.0,15.0,7.0,7.0, > 15.0,15.0,0.0,12.0,15.0)) > print > model_svm.predict(Vectors.dense(15.0,15.0,7.0,0.0,7.0,0. > 0,15.0,15.0,15.0,15.0)) > print > model_svm.predict(Vectors.dense(7.0,0.0,15.0,15.0,15.0, > 15.0,7.0,7.0,15.0,15.0)) > > print "supposed to be 0" > print model_svm.predict(Vectors.dense(18.0, 15.0, 7.0, 7.0, 15.0, 0.0, > 15.0, 15.0, 15.0, 15.0)) > print > model_svm.predict(Vectors.dense(11.0,13.0,7.0,10.0,7.0, > 13.0,7.0,19.0,7.0,7.0)) > print model_svm.predict(Vectors.dense(15.0, 15.0, 18.0, 7.0, 15.0, > 15.0, > 15.0, 18.0, 7.0, 15.0)) > print model_svm.predict(Vectors.dense(15.0, 15.0, 8.0, 0.0, 0.0, 8.0, > 15.0, 15.0, 15.0, 7.0)) > > which returns: > > supposed to be 1 > 12.8250120159 > 16.0786937313 > 14.2139435305 > 16.5115589658 > supposed to be 0 > 17.1311777004 > 14.075461697 > 20.8883372052 > 12.9132580999 > > when I am setting the threshold I am either getting all zeros or all ones. > > Does anyone know how to approach this problem? > > As I said I have checked multiple times that my dataset and feature > extraction logic are exactly the same in both cases. > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/scikit-learn-and-mllib-difference-in- > predictions-python-tp28240.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >