Hi Mano,

For the always positive prediction result. I think the standard svmguide
data [1] is labeling data as 0.0 and 1.0 instead of -1.0 and +1.0. Maybe
correcting that should work for your case.
For the change of eval pairs, I think SVM in FlinkML will always return
a +1.0 or -1.0 when you use it this way as a binary classification.

Thanks,
Rong

[1] https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1

On Fri, Jun 22, 2018 at 6:49 AM Mano Swerts <mano.swe...@ixxus.com> wrote:

> Hi guys,
>
> Here I am again. I am playing with Flink ML and was just trying to get the
> example to work used in the documentation:
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data
> (the one using the astroparticle LibSVM data).
>
> My code is basically what you see in the example, with some more output
> for verification:
>
>
> object LearnDocumentEntityRelationship {
>
>     val trainingDataPath = “/data/svmguide1.training.txt"
>     val testDataPath = “/data/svmguide1.test.txt"
>
>     def main(args: Array[String]) {
>         val env = ExecutionEnvironment.getExecutionEnvironment
>
>         val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env,
> trainingDataPath)
>
>         println("============================")
>         println("=== Training Data")
>         println("============================")
>         trainingData.print()
>
>         val testData = MLUtils.readLibSVM(env, testDataPath).map(x =>
> (x.vector, x.label))
>
>         println("============================")
>         println("=== Test Data")
>         println("============================")
>         testData.print()
>
>         val svm = SVM()
>             .setBlocks(env.getParallelism)
>             .setIterations(100)
>             .setRegularization(0.001)
>             .setStepsize(0.1)
>             .setSeed(42)
>
>         svm.fit(trainingData)
>
>         val evaluationPairs: DataSet[(Double, Double)] =
> svm.evaluate(testData)
>
>         println("============================")
>         println("=== Evaluation Pairs")
>         println("============================")
>         evaluationPairs.print()
>
>         val realData = MLUtils.readLibSVM(env, testDataPath).map(x =>
> x.vector)
>
>         var predictionDS = svm.predict(realData)
>
>         println("============================")
>         println("=== Predictions")
>         println("============================")
>         predictionDS.print()
>
>         println("=== End")
>
>         env.execute("Learn Document Entity Relationship Job")
>     }
> }
>
>
> The issue is that the predictions (from both the evaluation pairs and the
> prediction dataset) are always equal to “1.0”. When I changed the labels in
> the data files to 16 and 8 (so 1 is not a valid label anymore) it still
> keeps predicting “1.0” for every single record. I also tried with some
> other custom datasets, but I always get that same result.
>
> This is a concise part of the output (as the data contains to many records
> to put here):
>
> ============================
> === Test Data
> ============================
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797),
> (3,97.52163)),16.0)
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797),
> (3,97.52163)),16.0)
> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0)
> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0)
>
> ============================
> === Evaluation Pairs
> ============================
> (16.0,1.0)
> (16.0,1.0)
> (8.0,1.0)
> (8.0,1.0)
>
> ============================
> === Predictions
> ============================
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0)
> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0)
>
>
> Am I doing something wrong?
>
> Any pointers are greatly appreciated. Thanks!
>
> — Mano
>

Reply via email to