Thanks for the PR, Andrew! This has been fixed in 1.0.2.
On Thu, Apr 14, 2016 at 7:04 PM, Andrew Palumbo <ap....@outlook.com> wrote: > Do you want me to open a jira/pr for this? > > -------- Original message -------- > From: Stephan Ewen <se...@apache.org> > Date: 04/13/2016 5:16 AM (GMT-05:00) > To: dev@flink.apache.org > Subject: Re: Kryo StackOverflowError > > +1 to add this to 1.0.2 > > > On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <ap....@outlook.com> wrote: > >> >> Hi, >> >> Great! Do you think that this is something that you'll be enabling in your >> upcoming 1.0.2 release? We plan on putting out a maintenance Mahout >> Release relatively soon and this would allow us to speed up Matrix >> Multiplication greatly. >> >> Thanks, >> >> Andy >> ________________________________________ >> From: Till Rohrmann <trohrm...@apache.org> >> Sent: Tuesday, April 12, 2016 11:18 AM >> To: dev@flink.apache.org >> Subject: Re: Kryo StackOverflowError >> >> +1 >> >> On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >> > Good catch Till! >> > >> > I just checked it with the Mahout source code and the issues is gone with >> > reference tracking enabled. >> > >> > I would just re-enable it again in Flink. >> > >> > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <trohrm...@apache.org> >> > wrote: >> > >> > > Hey guys, >> > > >> > > I have a suspicion which could be the culprit: Could change the line >> > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the >> error >> > > still remains? We deactivated the reference tracking and now Kryo >> > shouldn’t >> > > be able to resolve cyclic references properly. >> > > >> > > Cheers, >> > > Till >> > > >> > > >> > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd < >> > todd.lison...@intel.com> >> > > wrote: >> > > >> > > > Hi, >> > > > >> > > > I also got this error message when I had private inner classes: >> > > > >> > > > public class A { >> > > > private class B { >> > > > } >> > > > } >> > > > >> > > > I was able to fix by making the inner classes public static: >> > > > >> > > > public class A { >> > > > public static class B { >> > > > } >> > > > } >> > > > >> > > > When I was trying to debug it seemed this error message can be caused >> > by >> > > > several different things. >> > > > >> > > > Thanks, >> > > > >> > > > Todd >> > > > >> > > > >> > > > -----Original Message----- >> > > > From: Hilmi Yildirim [mailto:hilmi.yildi...@dfki.de] >> > > > Sent: Sunday, April 10, 2016 11:36 AM >> > > > To: dev@flink.apache.org >> > > > Subject: Re: Kryo StackOverflowError >> > > > >> > > > Hi, >> > > > I also had this problem and solved it. >> > > > >> > > > In my case I had multiple objects which are created via anonymous >> > > classes. >> > > > When I broadcasted these objects, the serializer tried to serialize >> the >> > > > objects and for that it tried to serialize the anonymous classes. >> This >> > > > caused the problem. >> > > > >> > > > For example, >> > > > >> > > > class A{ >> > > > >> > > > def createObjects() : Array[Object]{ >> > > > objects >> > > > for{ >> > > > object = new Class{ >> > > > ... >> > > > } >> > > > objects.add(object) >> > > > } >> > > > return objects >> > > > } >> > > > } >> > > > >> > > > It tried to serialize "new Class". For that it tried to serialize the >> > > > method createObjects(). And then it tried to serialize class A. To >> > > > serialize class A it tried to serialize the method createObjects. Or >> > > > something like that, I do not remember the details. This caused the >> > > > recursion. >> > > > >> > > > BR, >> > > > Hilmi >> > > > >> > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen: >> > > > > Hi! >> > > > > >> > > > > Is it possible that some datatype has a recursive structure >> > > nonetheless? >> > > > > Something like a linked list or so, which would create a large >> object >> > > > graph? >> > > > > >> > > > > There seems to be a large object graph that the Kryo serializer >> > > > traverses, >> > > > > which causes the StackOverflowError. >> > > > > >> > > > > Greetings, >> > > > > Stephan >> > > > > >> > > > > >> > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo < >> ap....@outlook.com> >> > > > wrote: >> > > > > >> > > > >> Hi Stephan, >> > > > >> >> > > > >> thanks for answering. >> > > > >> >> > > > >> This not from a recursive object. (it is used in a recursive >> method >> > in >> > > > the >> > > > >> test that is throwing this error, but the the depth is only 2 and >> > > there >> > > > are >> > > > >> no other Flink DataSet operations before execution is triggered so >> > it >> > > is >> > > > >> trivial.) >> > > > >> >> > > > >> Gere is a Gist of the code, and the full output and stack trace: >> > > > >> >> > > > >> >> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419 >> > > > >> >> > > > >> The Error begins at line 178 of the "Output" file. >> > > > >> >> > > > >> Thanks >> > > > >> >> > > > >> ________________________________________ >> > > > >> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of >> > > > Stephan >> > > > >> Ewen <se...@apache.org> >> > > > >> Sent: Sunday, April 10, 2016 9:39 AM >> > > > >> To: dev@flink.apache.org >> > > > >> Subject: Re: Kryo StackOverflowError >> > > > >> >> > > > >> Hi! >> > > > >> >> > > > >> Sorry, I don't fully understand he diagnosis. >> > > > >> You say that this stack overflow is not from a recursive/object >> > type? >> > > > >> >> > > > >> Long graphs of operations in Flink usually do not cause >> > > > >> StackOverflowExceptions, because not the whole graph is >> recursively >> > > > >> processed. >> > > > >> >> > > > >> Can you paste the entire Stack Trace (for example to a gist)? >> > > > >> >> > > > >> Greetings, >> > > > >> Stephan >> > > > >> >> > > > >> >> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo < >> ap....@outlook.com >> > > >> > > > >> wrote: >> > > > >> >> > > > >>> Hi all, >> > > > >>> >> > > > >>> >> > > > >>> I am working on a matrix multiplication operation for Mahout >> Flink >> > > > >>> Bindings that uses quite a few chained Flink Dataset operations, >> > > > >>> >> > > > >>> >> > > > >>> When testing, I am getting the following error: >> > > > >>> >> > > > >>> >> > > > >>> {...} >> > > > >>> >> > > > >>> 04/09/2016 22:30:35 CHAIN Reduce (Reduce at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147)) >> > > > >>> -> FlatMap (FlatMap at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1) >> > > > >>> switched to CANCELED >> > > > >>> 04/09/2016 22:30:35 CHAIN Partition -> Map (Map at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240)) >> > > > >>> -> GroupCombine (GroupCombine at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129)) >> > > > >>> -> Combine (Reduce at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3) >> > > > >>> switched to FAILED >> > > > >>> java.lang.StackOverflowError >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) >> > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) >> > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) >> > > > >>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) >> > > > >>> at >> > > > >>> >> > > > >> >> > > > >> > > >> > >> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) >> > > > >>> {...} >> > > > >>> >> > > > >>> >> > > > >>> I've seen similar issues on the dev@flink list (and other >> places), >> > > > but I >> > > > >>> believe that they were from recursive calls and objects which >> > pointed >> > > > >> back >> > > > >>> to themselves somehow. >> > > > >>> >> > > > >>> >> > > > >>> This is a relatively straightforward method, it just has several >> > > Flink >> > > > >>> operations before execution is triggered. If I remove some >> > > > operations, >> > > > >>> eg. a reduce, i can get the method to complete on a simple test >> > > however >> > > > >> the >> > > > >>> it will then, of course be numerically incorrect. >> > > > >>> >> > > > >>> >> > > > >>> I am wondering if there is any workaround for this type of >> problem? >> > > > >>> >> > > > >>> >> > > > >>> Thank You, >> > > > >>> >> > > > >>> >> > > > >>> Andy >> > > > >>> >> > > > >> > > > >> > > > -- >> > > > ================================================================== >> > > > Hilmi Yildirim, M.Sc. >> > > > Researcher >> > > > >> > > > DFKI GmbH >> > > > Intelligente Analytik für Massendaten >> > > > DFKI Projektbüro Berlin >> > > > Alt-Moabit 91c >> > > > D-10559 Berlin >> > > > Phone: +49 30 23895 1814 >> > > > >> > > > E-Mail: hilmi.yildi...@dfki.de >> > > > >> > > > ------------------------------------------------------------- >> > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >> > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >> > > > >> > > > Geschaeftsfuehrung: >> > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >> > > > Dr. Walter Olthoff >> > > > >> > > > Vorsitzender des Aufsichtsrats: >> > > > Prof. Dr. h.c. Hans A. Aukes >> > > > >> > > > Amtsgericht Kaiserslautern, HRB 2313 >> > > > ------------------------------------------------------------- >> > > > >> > > > >> > > >> > >>