Hi Rong, As you can see in my test data example, I did change the labeling data to 8 and 16 instead of 1 and 0.
If SVM always returns +1.0 or -1.0, that would then indeed explain where the 1.0 is coming from. But, it never gives me -1.0, so there is still something wrong as it classifies everything under the same label. Thanks. — Mano > On 23 Jun 2018, at 20:50, Rong Rong <walter...@gmail.com> wrote: > > Hi Mano, > > For the always positive prediction result. I think the standard svmguide > data [1] is labeling data as 0.0 and 1.0 instead of -1.0 and +1.0. Maybe > correcting that should work for your case. > For the change of eval pairs, I think SVM in FlinkML will always return > a +1.0 or -1.0 when you use it this way as a binary classification. > > Thanks, > Rong > > [1] https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1 > > On Fri, Jun 22, 2018 at 6:49 AM Mano Swerts <mano.swe...@ixxus.com> wrote: > >> Hi guys, >> >> Here I am again. I am playing with Flink ML and was just trying to get the >> example to work used in the documentation: >> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data >> (the one using the astroparticle LibSVM data). >> >> My code is basically what you see in the example, with some more output >> for verification: >> >> >> object LearnDocumentEntityRelationship { >> >> val trainingDataPath = “/data/svmguide1.training.txt" >> val testDataPath = “/data/svmguide1.test.txt" >> >> def main(args: Array[String]) { >> val env = ExecutionEnvironment.getExecutionEnvironment >> >> val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env, >> trainingDataPath) >> >> println("============================") >> println("=== Training Data") >> println("============================") >> trainingData.print() >> >> val testData = MLUtils.readLibSVM(env, testDataPath).map(x => >> (x.vector, x.label)) >> >> println("============================") >> println("=== Test Data") >> println("============================") >> testData.print() >> >> val svm = SVM() >> .setBlocks(env.getParallelism) >> .setIterations(100) >> .setRegularization(0.001) >> .setStepsize(0.1) >> .setSeed(42) >> >> svm.fit(trainingData) >> >> val evaluationPairs: DataSet[(Double, Double)] = >> svm.evaluate(testData) >> >> println("============================") >> println("=== Evaluation Pairs") >> println("============================") >> evaluationPairs.print() >> >> val realData = MLUtils.readLibSVM(env, testDataPath).map(x => >> x.vector) >> >> var predictionDS = svm.predict(realData) >> >> println("============================") >> println("=== Predictions") >> println("============================") >> predictionDS.print() >> >> println("=== End") >> >> env.execute("Learn Document Entity Relationship Job") >> } >> } >> >> >> The issue is that the predictions (from both the evaluation pairs and the >> prediction dataset) are always equal to “1.0”. When I changed the labels in >> the data files to 16 and 8 (so 1 is not a valid label anymore) it still >> keeps predicting “1.0” for every single record. I also tried with some >> other custom datasets, but I always get that same result. >> >> This is a concise part of the output (as the data contains to many records >> to put here): >> >> ============================ >> === Test Data >> ============================ >> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), >> (3,97.52163)),16.0) >> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), >> (3,97.52163)),16.0) >> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0) >> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0) >> >> ============================ >> === Evaluation Pairs >> ============================ >> (16.0,1.0) >> (16.0,1.0) >> (8.0,1.0) >> (8.0,1.0) >> >> ============================ >> === Predictions >> ============================ >> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0) >> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0) >> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0) >> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0) >> >> >> Am I doing something wrong? >> >> Any pointers are greatly appreciated. Thanks! >> >> — Mano >>