Re: FlinkML SVM Predictions are always 1.0

Mano Swerts Mon, 25 Jun 2018 02:41:11 -0700

Hi Rong,

As you can see in my test data example, I did change the labeling data to 8 and 
16 instead of 1 and 0.


If SVM always returns +1.0 or -1.0, that would then indeed explain where the 
1.0 is coming from. But, it never gives me -1.0, so there is still something 
wrong as it classifies everything under the same label.

Thanks.

— Mano

> On 23 Jun 2018, at 20:50, Rong Rong <walter...@gmail.com> wrote:
> 
> Hi Mano,
> 
> For the always positive prediction result. I think the standard svmguide
> data [1] is labeling data as 0.0 and 1.0 instead of -1.0 and +1.0. Maybe
> correcting that should work for your case.
> For the change of eval pairs, I think SVM in FlinkML will always return
> a +1.0 or -1.0 when you use it this way as a binary classification.
> 
> Thanks,
> Rong
> 
> [1] https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1
> 
> On Fri, Jun 22, 2018 at 6:49 AM Mano Swerts <mano.swe...@ixxus.com> wrote:
> 
>> Hi guys,
>> 
>> Here I am again. I am playing with Flink ML and was just trying to get the
>> example to work used in the documentation:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data
>> (the one using the astroparticle LibSVM data).
>> 
>> My code is basically what you see in the example, with some more output
>> for verification:
>> 
>> 
>> object LearnDocumentEntityRelationship {
>> 
>>    val trainingDataPath = “/data/svmguide1.training.txt"
>>    val testDataPath = “/data/svmguide1.test.txt"
>> 
>>    def main(args: Array[String]) {
>>        val env = ExecutionEnvironment.getExecutionEnvironment
>> 
>>        val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env,
>> trainingDataPath)
>> 
>>        println("============================")
>>        println("=== Training Data")
>>        println("============================")
>>        trainingData.print()
>> 
>>        val testData = MLUtils.readLibSVM(env, testDataPath).map(x =>
>> (x.vector, x.label))
>> 
>>        println("============================")
>>        println("=== Test Data")
>>        println("============================")
>>        testData.print()
>> 
>>        val svm = SVM()
>>            .setBlocks(env.getParallelism)
>>            .setIterations(100)
>>            .setRegularization(0.001)
>>            .setStepsize(0.1)
>>            .setSeed(42)
>> 
>>        svm.fit(trainingData)
>> 
>>        val evaluationPairs: DataSet[(Double, Double)] =
>> svm.evaluate(testData)
>> 
>>        println("============================")
>>        println("=== Evaluation Pairs")
>>        println("============================")
>>        evaluationPairs.print()
>> 
>>        val realData = MLUtils.readLibSVM(env, testDataPath).map(x =>
>> x.vector)
>> 
>>        var predictionDS = svm.predict(realData)
>> 
>>        println("============================")
>>        println("=== Predictions")
>>        println("============================")
>>        predictionDS.print()
>> 
>>        println("=== End")
>> 
>>        env.execute("Learn Document Entity Relationship Job")
>>    }
>> }
>> 
>> 
>> The issue is that the predictions (from both the evaluation pairs and the
>> prediction dataset) are always equal to “1.0”. When I changed the labels in
>> the data files to 16 and 8 (so 1 is not a valid label anymore) it still
>> keeps predicting “1.0” for every single record. I also tried with some
>> other custom datasets, but I always get that same result.
>> 
>> This is a concise part of the output (as the data contains to many records
>> to put here):
>> 
>> ============================
>> === Test Data
>> ============================
>> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797),
>> (3,97.52163)),16.0)
>> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797),
>> (3,97.52163)),16.0)
>> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0)
>> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0)
>> 
>> ============================
>> === Evaluation Pairs
>> ============================
>> (16.0,1.0)
>> (16.0,1.0)
>> (8.0,1.0)
>> (8.0,1.0)
>> 
>> ============================
>> === Predictions
>> ============================
>> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
>> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
>> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0)
>> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0)
>> 
>> 
>> Am I doing something wrong?
>> 
>> Any pointers are greatly appreciated. Thanks!
>> 
>> — Mano
>>

Re: FlinkML SVM Predictions are always 1.0

Reply via email to