Hey guys,

I have a suspicion which could be the culprit: Could change the line
KryoSerializer.java:328 to kryo.setReferences(true) and try if the error
still remains? We deactivated the reference tracking and now Kryo shouldn’t
be able to resolve cyclic references properly.

Cheers,
Till
​

On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <todd.lison...@intel.com>
wrote:

> Hi,
>
> I also got this error message when I had private inner classes:
>
> public class A {
>     private class B {
>     }
> }
>
> I was able to fix by making the inner classes public static:
>
> public class A {
>     public static class B {
>     }
> }
>
> When I was trying to debug it seemed this error message can be caused by
> several different things.
>
> Thanks,
>
> Todd
>
>
> -----Original Message-----
> From: Hilmi Yildirim [mailto:hilmi.yildi...@dfki.de]
> Sent: Sunday, April 10, 2016 11:36 AM
> To: dev@flink.apache.org
> Subject: Re: Kryo StackOverflowError
>
> Hi,
> I also had this problem and solved it.
>
> In my case I had multiple objects which are created via anonymous classes.
> When I broadcasted these objects, the serializer tried to serialize the
> objects and for that it tried to serialize the anonymous classes. This
> caused the problem.
>
> For example,
>
> class A{
>
>   def createObjects() : Array[Object]{
>             objects
>          for{
>              object = new Class{
>              ...
>              }
>              objects.add(object)
>          }
>          return objects
>      }
> }
>
> It tried to serialize "new Class". For that it tried to serialize the
> method createObjects(). And then it tried to serialize class A. To
> serialize class A it tried to serialize the method createObjects. Or
> something like that, I do not remember the details. This caused the
> recursion.
>
> BR,
> Hilmi
>
> Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > Hi!
> >
> > Is it possible that some datatype has a recursive structure nonetheless?
> > Something like a linked list or so, which would create a large object
> graph?
> >
> > There seems to be a large object graph that the Kryo serializer
> traverses,
> > which causes the StackOverflowError.
> >
> > Greetings,
> > Stephan
> >
> >
> > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <ap....@outlook.com>
> wrote:
> >
> >> Hi Stephan,
> >>
> >> thanks for answering.
> >>
> >> This not from a recursive object. (it is used in a recursive method in
> the
> >> test that is throwing this error, but the the depth is only 2 and there
> are
> >> no other Flink DataSet operations before execution is triggered so it is
> >> trivial.)
> >>
> >> Gere is a Gist of the code, and the full output and stack trace:
> >>
> >> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> >>
> >> The Error begins at line 178 of the "Output" file.
> >>
> >> Thanks
> >>
> >> ________________________________________
> >> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of
> Stephan
> >> Ewen <se...@apache.org>
> >> Sent: Sunday, April 10, 2016 9:39 AM
> >> To: dev@flink.apache.org
> >> Subject: Re: Kryo StackOverflowError
> >>
> >> Hi!
> >>
> >> Sorry, I don't fully understand he diagnosis.
> >> You say that this stack overflow is not from a recursive/object type?
> >>
> >> Long graphs of operations in Flink usually do not cause
> >> StackOverflowExceptions, because not the whole graph is recursively
> >> processed.
> >>
> >> Can you paste the entire Stack Trace (for example to a gist)?
> >>
> >> Greetings,
> >> Stephan
> >>
> >>
> >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <ap....@outlook.com>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>>
> >>> I am working on a matrix multiplication operation for Mahout Flink
> >>> Bindings that uses quite a few chained Flink Dataset operations,
> >>>
> >>>
> >>> When testing, I am getting the following error:
> >>>
> >>>
> >>> {...}
> >>>
> >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> >>> -> FlatMap (FlatMap at
> >>>
> >>
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> >>> switched to CANCELED
> >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> >>> -> GroupCombine (GroupCombine at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> >>> -> Combine (Reduce at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> >>> switched to FAILED
> >>> java.lang.StackOverflowError
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>> {...}
> >>>
> >>>
> >>> I've seen similar issues on the dev@flink list (and other places),
> but I
> >>> believe that they were from recursive calls and objects which pointed
> >> back
> >>> to themselves somehow.
> >>>
> >>>
> >>> This is a relatively straightforward method, it just has several Flink
> >>> operations before execution is triggered.   If I remove some
> operations,
> >>> eg. a reduce, i can get the method to complete on a simple test however
> >> the
> >>> it will then, of course be numerically incorrect.
> >>>
> >>>
> >>> I am wondering if there is any workaround for this type of problem?
> >>>
> >>>
> >>> Thank You,
> >>>
> >>>
> >>> Andy
> >>>
>
>
> --
> ==================================================================
> Hilmi Yildirim, M.Sc.
> Researcher
>
> DFKI GmbH
> Intelligente Analytik für Massendaten
> DFKI Projektbüro Berlin
> Alt-Moabit 91c
> D-10559 Berlin
> Phone: +49 30 23895 1814
>
> E-Mail: hilmi.yildi...@dfki.de
>
> -------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------
>
>

Reply via email to