+1 On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <rmetz...@apache.org> wrote:
> Good catch Till! > > I just checked it with the Mahout source code and the issues is gone with > reference tracking enabled. > > I would just re-enable it again in Flink. > > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <trohrm...@apache.org> > wrote: > > > Hey guys, > > > > I have a suspicion which could be the culprit: Could change the line > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the error > > still remains? We deactivated the reference tracking and now Kryo > shouldn’t > > be able to resolve cyclic references properly. > > > > Cheers, > > Till > > > > > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd < > todd.lison...@intel.com> > > wrote: > > > > > Hi, > > > > > > I also got this error message when I had private inner classes: > > > > > > public class A { > > > private class B { > > > } > > > } > > > > > > I was able to fix by making the inner classes public static: > > > > > > public class A { > > > public static class B { > > > } > > > } > > > > > > When I was trying to debug it seemed this error message can be caused > by > > > several different things. > > > > > > Thanks, > > > > > > Todd > > > > > > > > > -----Original Message----- > > > From: Hilmi Yildirim [mailto:hilmi.yildi...@dfki.de] > > > Sent: Sunday, April 10, 2016 11:36 AM > > > To: dev@flink.apache.org > > > Subject: Re: Kryo StackOverflowError > > > > > > Hi, > > > I also had this problem and solved it. > > > > > > In my case I had multiple objects which are created via anonymous > > classes. > > > When I broadcasted these objects, the serializer tried to serialize the > > > objects and for that it tried to serialize the anonymous classes. This > > > caused the problem. > > > > > > For example, > > > > > > class A{ > > > > > > def createObjects() : Array[Object]{ > > > objects > > > for{ > > > object = new Class{ > > > ... > > > } > > > objects.add(object) > > > } > > > return objects > > > } > > > } > > > > > > It tried to serialize "new Class". For that it tried to serialize the > > > method createObjects(). And then it tried to serialize class A. To > > > serialize class A it tried to serialize the method createObjects. Or > > > something like that, I do not remember the details. This caused the > > > recursion. > > > > > > BR, > > > Hilmi > > > > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen: > > > > Hi! > > > > > > > > Is it possible that some datatype has a recursive structure > > nonetheless? > > > > Something like a linked list or so, which would create a large object > > > graph? > > > > > > > > There seems to be a large object graph that the Kryo serializer > > > traverses, > > > > which causes the StackOverflowError. > > > > > > > > Greetings, > > > > Stephan > > > > > > > > > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <ap....@outlook.com> > > > wrote: > > > > > > > >> Hi Stephan, > > > >> > > > >> thanks for answering. > > > >> > > > >> This not from a recursive object. (it is used in a recursive method > in > > > the > > > >> test that is throwing this error, but the the depth is only 2 and > > there > > > are > > > >> no other Flink DataSet operations before execution is triggered so > it > > is > > > >> trivial.) > > > >> > > > >> Gere is a Gist of the code, and the full output and stack trace: > > > >> > > > >> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419 > > > >> > > > >> The Error begins at line 178 of the "Output" file. > > > >> > > > >> Thanks > > > >> > > > >> ________________________________________ > > > >> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of > > > Stephan > > > >> Ewen <se...@apache.org> > > > >> Sent: Sunday, April 10, 2016 9:39 AM > > > >> To: dev@flink.apache.org > > > >> Subject: Re: Kryo StackOverflowError > > > >> > > > >> Hi! > > > >> > > > >> Sorry, I don't fully understand he diagnosis. > > > >> You say that this stack overflow is not from a recursive/object > type? > > > >> > > > >> Long graphs of operations in Flink usually do not cause > > > >> StackOverflowExceptions, because not the whole graph is recursively > > > >> processed. > > > >> > > > >> Can you paste the entire Stack Trace (for example to a gist)? > > > >> > > > >> Greetings, > > > >> Stephan > > > >> > > > >> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <ap....@outlook.com > > > > > >> wrote: > > > >> > > > >>> Hi all, > > > >>> > > > >>> > > > >>> I am working on a matrix multiplication operation for Mahout Flink > > > >>> Bindings that uses quite a few chained Flink Dataset operations, > > > >>> > > > >>> > > > >>> When testing, I am getting the following error: > > > >>> > > > >>> > > > >>> {...} > > > >>> > > > >>> 04/09/2016 22:30:35 CHAIN Reduce (Reduce at > > > >>> > > > >> > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147)) > > > >>> -> FlatMap (FlatMap at > > > >>> > > > >> > > > > > > org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1) > > > >>> switched to CANCELED > > > >>> 04/09/2016 22:30:35 CHAIN Partition -> Map (Map at > > > >>> > > > >> > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240)) > > > >>> -> GroupCombine (GroupCombine at > > > >>> > > > >> > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129)) > > > >>> -> Combine (Reduce at > > > >>> > > > >> > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3) > > > >>> switched to FAILED > > > >>> java.lang.StackOverflowError > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) > > > >>> at > > > >>> > > > >> > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > >>> {...} > > > >>> > > > >>> > > > >>> I've seen similar issues on the dev@flink list (and other places), > > > but I > > > >>> believe that they were from recursive calls and objects which > pointed > > > >> back > > > >>> to themselves somehow. > > > >>> > > > >>> > > > >>> This is a relatively straightforward method, it just has several > > Flink > > > >>> operations before execution is triggered. If I remove some > > > operations, > > > >>> eg. a reduce, i can get the method to complete on a simple test > > however > > > >> the > > > >>> it will then, of course be numerically incorrect. > > > >>> > > > >>> > > > >>> I am wondering if there is any workaround for this type of problem? > > > >>> > > > >>> > > > >>> Thank You, > > > >>> > > > >>> > > > >>> Andy > > > >>> > > > > > > > > > -- > > > ================================================================== > > > Hilmi Yildirim, M.Sc. > > > Researcher > > > > > > DFKI GmbH > > > Intelligente Analytik für Massendaten > > > DFKI Projektbüro Berlin > > > Alt-Moabit 91c > > > D-10559 Berlin > > > Phone: +49 30 23895 1814 > > > > > > E-Mail: hilmi.yildi...@dfki.de > > > > > > ------------------------------------------------------------- > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > > > > > Geschaeftsfuehrung: > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > > > Dr. Walter Olthoff > > > > > > Vorsitzender des Aufsichtsrats: > > > Prof. Dr. h.c. Hans A. Aukes > > > > > > Amtsgericht Kaiserslautern, HRB 2313 > > > ------------------------------------------------------------- > > > > > > > > >