Hello all,

I'm trying to run a job using FlinkML and I'm confused about the source of
an error.

The job reads a libSVM formatted file and trains an SVM classifier on it.

I've tried this with small datasets and everything works out fine.

When trying to run the same job on a large dataset (~11GB uncompressed)
however, I get the following error:


> java.lang.RuntimeException: Error obtaining the sorted input: Thread
> 'SortMerger spilling thread' terminated due to an exception:
> java.lang.IndexOutOfBoundsException: Index: 14, Size: 2
> Serialization trace:
> indices (org.apache.flink.ml.math.SparseVector)
>         at
> org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:619)
>         at
> org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1089)
>         at
> org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:78)
>         at
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:489)
>         at
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:354)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Thread 'SortMerger spilling thread'
> terminated due to an exception: java.lang.IndexOutOfBoundsException: Index:
> 14, Size: 2
> Serialization trace:
> indices (org.apache.flink.ml.math.SparseVector)
>         at
> org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:800)
> Caused by: com.esotericsoftware.kryo.KryoException:
> java.lang.IndexOutOfBoundsException: Index: 14, Size: 2
> Serialization trace:
> indices (org.apache.flink.ml.math.SparseVector)
>         at
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>         at
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
>         at
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:222)
>         at
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:236)
>         at
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:246)
>         at
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.copy(TupleSerializerBase.java:73)
>         at
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.copy(TupleSerializerBase.java:73)
>         at
> org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:499)
>         at
> org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1344)
>         at
> org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 14, Size: 2
>         at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>         at java.util.ArrayList.set(ArrayList.java:444)
>         at
> com.esotericsoftware.kryo.util.MapReferenceResolver.setReadObject(MapReferenceResolver.java:38)
>         at com.esotericsoftware.kryo.Kryo.reference(Kryo.java:823)
>         at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:731)
>         at
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
>         ... 10 more



Any idea what might be causing this? I'm running the job in local mode, 1
TM with 8 slots and ~32GB heap size.

All the vectors created by the libSVM loader have the correct size.

Reply via email to