Do you want me to open a jira/pr for this? -------- Original message -------- From: Stephan Ewen <se...@apache.org> Date: 04/13/2016 5:16 AM (GMT-05:00) To: dev@flink.apache.org Subject: Re: Kryo StackOverflowError
+1 to add this to 1.0.2 On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <ap....@outlook.com> wrote: > > Hi, > > Great! Do you think that this is something that you'll be enabling in your > upcoming 1.0.2 release? We plan on putting out a maintenance Mahout > Release relatively soon and this would allow us to speed up Matrix > Multiplication greatly. > > Thanks, > > Andy > ________________________________________ > From: Till Rohrmann <trohrm...@apache.org> > Sent: Tuesday, April 12, 2016 11:18 AM > To: dev@flink.apache.org > Subject: Re: Kryo StackOverflowError > > +1 > > On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > Good catch Till! > > > > I just checked it with the Mahout source code and the issues is gone with > > reference tracking enabled. > > > > I would just re-enable it again in Flink. > > > > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > Hey guys, > > > > > > I have a suspicion which could be the culprit: Could change the line > > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the > error > > > still remains? We deactivated the reference tracking and now Kryo > > shouldn’t > > > be able to resolve cyclic references properly. > > > > > > Cheers, > > > Till > > > > > > > > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd < > > todd.lison...@intel.com> > > > wrote: > > > > > > > Hi, > > > > > > > > I also got this error message when I had private inner classes: > > > > > > > > public class A { > > > > private class B { > > > > } > > > > } > > > > > > > > I was able to fix by making the inner classes public static: > > > > > > > > public class A { > > > > public static class B { > > > > } > > > > } > > > > > > > > When I was trying to debug it seemed this error message can be caused > > by > > > > several different things. > > > > > > > > Thanks, > > > > > > > > Todd > > > > > > > > > > > > -----Original Message----- > > > > From: Hilmi Yildirim [mailto:hilmi.yildi...@dfki.de] > > > > Sent: Sunday, April 10, 2016 11:36 AM > > > > To: dev@flink.apache.org > > > > Subject: Re: Kryo StackOverflowError > > > > > > > > Hi, > > > > I also had this problem and solved it. > > > > > > > > In my case I had multiple objects which are created via anonymous > > > classes. > > > > When I broadcasted these objects, the serializer tried to serialize > the > > > > objects and for that it tried to serialize the anonymous classes. > This > > > > caused the problem. > > > > > > > > For example, > > > > > > > > class A{ > > > > > > > > def createObjects() : Array[Object]{ > > > > objects > > > > for{ > > > > object = new Class{ > > > > ... > > > > } > > > > objects.add(object) > > > > } > > > > return objects > > > > } > > > > } > > > > > > > > It tried to serialize "new Class". For that it tried to serialize the > > > > method createObjects(). And then it tried to serialize class A. To > > > > serialize class A it tried to serialize the method createObjects. Or > > > > something like that, I do not remember the details. This caused the > > > > recursion. > > > > > > > > BR, > > > > Hilmi > > > > > > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen: > > > > > Hi! > > > > > > > > > > Is it possible that some datatype has a recursive structure > > > nonetheless? > > > > > Something like a linked list or so, which would create a large > object > > > > graph? > > > > > > > > > > There seems to be a large object graph that the Kryo serializer > > > > traverses, > > > > > which causes the StackOverflowError. > > > > > > > > > > Greetings, > > > > > Stephan > > > > > > > > > > > > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo < > ap....@outlook.com> > > > > wrote: > > > > > > > > > >> Hi Stephan, > > > > >> > > > > >> thanks for answering. > > > > >> > > > > >> This not from a recursive object. (it is used in a recursive > method > > in > > > > the > > > > >> test that is throwing this error, but the the depth is only 2 and > > > there > > > > are > > > > >> no other Flink DataSet operations before execution is triggered so > > it > > > is > > > > >> trivial.) > > > > >> > > > > >> Gere is a Gist of the code, and the full output and stack trace: > > > > >> > > > > >> > > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419 > > > > >> > > > > >> The Error begins at line 178 of the "Output" file. > > > > >> > > > > >> Thanks > > > > >> > > > > >> ________________________________________ > > > > >> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of > > > > Stephan > > > > >> Ewen <se...@apache.org> > > > > >> Sent: Sunday, April 10, 2016 9:39 AM > > > > >> To: dev@flink.apache.org > > > > >> Subject: Re: Kryo StackOverflowError > > > > >> > > > > >> Hi! > > > > >> > > > > >> Sorry, I don't fully understand he diagnosis. > > > > >> You say that this stack overflow is not from a recursive/object > > type? > > > > >> > > > > >> Long graphs of operations in Flink usually do not cause > > > > >> StackOverflowExceptions, because not the whole graph is > recursively > > > > >> processed. > > > > >> > > > > >> Can you paste the entire Stack Trace (for example to a gist)? > > > > >> > > > > >> Greetings, > > > > >> Stephan > > > > >> > > > > >> > > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo < > ap....@outlook.com > > > > > > > >> wrote: > > > > >> > > > > >>> Hi all, > > > > >>> > > > > >>> > > > > >>> I am working on a matrix multiplication operation for Mahout > Flink > > > > >>> Bindings that uses quite a few chained Flink Dataset operations, > > > > >>> > > > > >>> > > > > >>> When testing, I am getting the following error: > > > > >>> > > > > >>> > > > > >>> {...} > > > > >>> > > > > >>> 04/09/2016 22:30:35 CHAIN Reduce (Reduce at > > > > >>> > > > > >> > > > > > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147)) > > > > >>> -> FlatMap (FlatMap at > > > > >>> > > > > >> > > > > > > > > > > org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1) > > > > >>> switched to CANCELED > > > > >>> 04/09/2016 22:30:35 CHAIN Partition -> Map (Map at > > > > >>> > > > > >> > > > > > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240)) > > > > >>> -> GroupCombine (GroupCombine at > > > > >>> > > > > >> > > > > > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129)) > > > > >>> -> Combine (Reduce at > > > > >>> > > > > >> > > > > > > > > > > org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3) > > > > >>> switched to FAILED > > > > >>> java.lang.StackOverflowError > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) > > > > >>> at > > > > >>> > > > > >> > > > > > > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) > > > > >>> {...} > > > > >>> > > > > >>> > > > > >>> I've seen similar issues on the dev@flink list (and other > places), > > > > but I > > > > >>> believe that they were from recursive calls and objects which > > pointed > > > > >> back > > > > >>> to themselves somehow. > > > > >>> > > > > >>> > > > > >>> This is a relatively straightforward method, it just has several > > > Flink > > > > >>> operations before execution is triggered. If I remove some > > > > operations, > > > > >>> eg. a reduce, i can get the method to complete on a simple test > > > however > > > > >> the > > > > >>> it will then, of course be numerically incorrect. > > > > >>> > > > > >>> > > > > >>> I am wondering if there is any workaround for this type of > problem? > > > > >>> > > > > >>> > > > > >>> Thank You, > > > > >>> > > > > >>> > > > > >>> Andy > > > > >>> > > > > > > > > > > > > -- > > > > ================================================================== > > > > Hilmi Yildirim, M.Sc. > > > > Researcher > > > > > > > > DFKI GmbH > > > > Intelligente Analytik für Massendaten > > > > DFKI Projektbüro Berlin > > > > Alt-Moabit 91c > > > > D-10559 Berlin > > > > Phone: +49 30 23895 1814 > > > > > > > > E-Mail: hilmi.yildi...@dfki.de > > > > > > > > ------------------------------------------------------------- > > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > > > > > > > Geschaeftsfuehrung: > > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > > > > Dr. Walter Olthoff > > > > > > > > Vorsitzender des Aufsichtsrats: > > > > Prof. Dr. h.c. Hans A. Aukes > > > > > > > > Amtsgericht Kaiserslautern, HRB 2313 > > > > ------------------------------------------------------------- > > > > > > > > > > > > > >