Do you want me to open a jira/pr for this?

-------- Original message --------
From: Stephan Ewen <se...@apache.org>
Date: 04/13/2016 5:16 AM (GMT-05:00)
To: dev@flink.apache.org
Subject: Re: Kryo StackOverflowError

+1 to add this to 1.0.2


On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <ap....@outlook.com> wrote:

>
> Hi,
>
> Great! Do you think that this is something that you'll be enabling in your
> upcoming 1.0.2 release?  We plan on putting out a maintenance Mahout
> Release relatively soon and this would allow us to speed up Matrix
> Multiplication greatly.
>
> Thanks,
>
> Andy
> ________________________________________
> From: Till Rohrmann <trohrm...@apache.org>
> Sent: Tuesday, April 12, 2016 11:18 AM
> To: dev@flink.apache.org
> Subject: Re: Kryo StackOverflowError
>
> +1
>
> On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
> > Good catch Till!
> >
> > I just checked it with the Mahout source code and the issues is gone with
> > reference tracking enabled.
> >
> > I would just re-enable it again in Flink.
> >
> > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > Hey guys,
> > >
> > > I have a suspicion which could be the culprit: Could change the line
> > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the
> error
> > > still remains? We deactivated the reference tracking and now Kryo
> > shouldn’t
> > > be able to resolve cyclic references properly.
> > >
> > > Cheers,
> > > Till
> > > ​
> > >
> > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
> > todd.lison...@intel.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I also got this error message when I had private inner classes:
> > > >
> > > > public class A {
> > > >     private class B {
> > > >     }
> > > > }
> > > >
> > > > I was able to fix by making the inner classes public static:
> > > >
> > > > public class A {
> > > >     public static class B {
> > > >     }
> > > > }
> > > >
> > > > When I was trying to debug it seemed this error message can be caused
> > by
> > > > several different things.
> > > >
> > > > Thanks,
> > > >
> > > > Todd
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Hilmi Yildirim [mailto:hilmi.yildi...@dfki.de]
> > > > Sent: Sunday, April 10, 2016 11:36 AM
> > > > To: dev@flink.apache.org
> > > > Subject: Re: Kryo StackOverflowError
> > > >
> > > > Hi,
> > > > I also had this problem and solved it.
> > > >
> > > > In my case I had multiple objects which are created via anonymous
> > > classes.
> > > > When I broadcasted these objects, the serializer tried to serialize
> the
> > > > objects and for that it tried to serialize the anonymous classes.
> This
> > > > caused the problem.
> > > >
> > > > For example,
> > > >
> > > > class A{
> > > >
> > > >   def createObjects() : Array[Object]{
> > > >             objects
> > > >          for{
> > > >              object = new Class{
> > > >              ...
> > > >              }
> > > >              objects.add(object)
> > > >          }
> > > >          return objects
> > > >      }
> > > > }
> > > >
> > > > It tried to serialize "new Class". For that it tried to serialize the
> > > > method createObjects(). And then it tried to serialize class A. To
> > > > serialize class A it tried to serialize the method createObjects. Or
> > > > something like that, I do not remember the details. This caused the
> > > > recursion.
> > > >
> > > > BR,
> > > > Hilmi
> > > >
> > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > > > Hi!
> > > > >
> > > > > Is it possible that some datatype has a recursive structure
> > > nonetheless?
> > > > > Something like a linked list or so, which would create a large
> object
> > > > graph?
> > > > >
> > > > > There seems to be a large object graph that the Kryo serializer
> > > > traverses,
> > > > > which causes the StackOverflowError.
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <
> ap....@outlook.com>
> > > > wrote:
> > > > >
> > > > >> Hi Stephan,
> > > > >>
> > > > >> thanks for answering.
> > > > >>
> > > > >> This not from a recursive object. (it is used in a recursive
> method
> > in
> > > > the
> > > > >> test that is throwing this error, but the the depth is only 2 and
> > > there
> > > > are
> > > > >> no other Flink DataSet operations before execution is triggered so
> > it
> > > is
> > > > >> trivial.)
> > > > >>
> > > > >> Gere is a Gist of the code, and the full output and stack trace:
> > > > >>
> > > > >>
> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > > > >>
> > > > >> The Error begins at line 178 of the "Output" file.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> ________________________________________
> > > > >> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of
> > > > Stephan
> > > > >> Ewen <se...@apache.org>
> > > > >> Sent: Sunday, April 10, 2016 9:39 AM
> > > > >> To: dev@flink.apache.org
> > > > >> Subject: Re: Kryo StackOverflowError
> > > > >>
> > > > >> Hi!
> > > > >>
> > > > >> Sorry, I don't fully understand he diagnosis.
> > > > >> You say that this stack overflow is not from a recursive/object
> > type?
> > > > >>
> > > > >> Long graphs of operations in Flink usually do not cause
> > > > >> StackOverflowExceptions, because not the whole graph is
> recursively
> > > > >> processed.
> > > > >>
> > > > >> Can you paste the entire Stack Trace (for example to a gist)?
> > > > >>
> > > > >> Greetings,
> > > > >> Stephan
> > > > >>
> > > > >>
> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <
> ap....@outlook.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi all,
> > > > >>>
> > > > >>>
> > > > >>> I am working on a matrix multiplication operation for Mahout
> Flink
> > > > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > > > >>>
> > > > >>>
> > > > >>> When testing, I am getting the following error:
> > > > >>>
> > > > >>>
> > > > >>> {...}
> > > > >>>
> > > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > > > >>> -> FlatMap (FlatMap at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > > > >>> switched to CANCELED
> > > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > > > >>> -> GroupCombine (GroupCombine at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > > > >>> -> Combine (Reduce at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > > > >>> switched to FAILED
> > > > >>> java.lang.StackOverflowError
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>> {...}
> > > > >>>
> > > > >>>
> > > > >>> I've seen similar issues on the dev@flink list (and other
> places),
> > > > but I
> > > > >>> believe that they were from recursive calls and objects which
> > pointed
> > > > >> back
> > > > >>> to themselves somehow.
> > > > >>>
> > > > >>>
> > > > >>> This is a relatively straightforward method, it just has several
> > > Flink
> > > > >>> operations before execution is triggered.   If I remove some
> > > > operations,
> > > > >>> eg. a reduce, i can get the method to complete on a simple test
> > > however
> > > > >> the
> > > > >>> it will then, of course be numerically incorrect.
> > > > >>>
> > > > >>>
> > > > >>> I am wondering if there is any workaround for this type of
> problem?
> > > > >>>
> > > > >>>
> > > > >>> Thank You,
> > > > >>>
> > > > >>>
> > > > >>> Andy
> > > > >>>
> > > >
> > > >
> > > > --
> > > > ==================================================================
> > > > Hilmi Yildirim, M.Sc.
> > > > Researcher
> > > >
> > > > DFKI GmbH
> > > > Intelligente Analytik für Massendaten
> > > > DFKI Projektbüro Berlin
> > > > Alt-Moabit 91c
> > > > D-10559 Berlin
> > > > Phone: +49 30 23895 1814
> > > >
> > > > E-Mail: hilmi.yildi...@dfki.de
> > > >
> > > > -------------------------------------------------------------
> > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> > > >
> > > > Geschaeftsfuehrung:
> > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > > > Dr. Walter Olthoff
> > > >
> > > > Vorsitzender des Aufsichtsrats:
> > > > Prof. Dr. h.c. Hans A. Aukes
> > > >
> > > > Amtsgericht Kaiserslautern, HRB 2313
> > > > -------------------------------------------------------------
> > > >
> > > >
> > >
> >
>

Reply via email to