Hi!

Is it possible that some datatype has a recursive structure nonetheless?
Something like a linked list or so, which would create a large object graph?

There seems to be a large object graph that the Kryo serializer traverses,
which causes the StackOverflowError.

Greetings,
Stephan


On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <ap....@outlook.com> wrote:

> Hi Stephan,
>
> thanks for answering.
>
> This not from a recursive object. (it is used in a recursive method in the
> test that is throwing this error, but the the depth is only 2 and there are
> no other Flink DataSet operations before execution is triggered so it is
> trivial.)
>
> Gere is a Gist of the code, and the full output and stack trace:
>
> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
>
> The Error begins at line 178 of the "Output" file.
>
> Thanks
>
> ________________________________________
> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of Stephan
> Ewen <se...@apache.org>
> Sent: Sunday, April 10, 2016 9:39 AM
> To: dev@flink.apache.org
> Subject: Re: Kryo StackOverflowError
>
> Hi!
>
> Sorry, I don't fully understand he diagnosis.
> You say that this stack overflow is not from a recursive/object type?
>
> Long graphs of operations in Flink usually do not cause
> StackOverflowExceptions, because not the whole graph is recursively
> processed.
>
> Can you paste the entire Stack Trace (for example to a gist)?
>
> Greetings,
> Stephan
>
>
> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <ap....@outlook.com>
> wrote:
>
> > Hi all,
> >
> >
> > I am working on a matrix multiplication operation for Mahout Flink
> > Bindings that uses quite a few chained Flink Dataset operations,
> >
> >
> > When testing, I am getting the following error:
> >
> >
> > {...}
> >
> > 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > -> FlatMap (FlatMap at
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > switched to CANCELED
> > 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > -> GroupCombine (GroupCombine at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > -> Combine (Reduce at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > switched to FAILED
> > java.lang.StackOverflowError
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > {...}
> >
> >
> > I've seen similar issues on the dev@flink list (and other places), but I
> > believe that they were from recursive calls and objects which pointed
> back
> > to themselves somehow.
> >
> >
> > This is a relatively straightforward method, it just has several Flink
> > operations before execution is triggered.   If I remove some operations,
> > eg. a reduce, i can get the method to complete on a simple test however
> the
> > it will then, of course be numerically incorrect.
> >
> >
> > I am wondering if there is any workaround for this type of problem?
> >
> >
> > Thank You,
> >
> >
> > Andy
> >
>

Reply via email to