Thanks for the PR, Andrew! This has been fixed in 1.0.2.

On Thu, Apr 14, 2016 at 7:04 PM, Andrew Palumbo <ap....@outlook.com> wrote:
> Do you want me to open a jira/pr for this?
>
> -------- Original message --------
> From: Stephan Ewen <se...@apache.org>
> Date: 04/13/2016 5:16 AM (GMT-05:00)
> To: dev@flink.apache.org
> Subject: Re: Kryo StackOverflowError
>
> +1 to add this to 1.0.2
>
>
> On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <ap....@outlook.com> wrote:
>
>>
>> Hi,
>>
>> Great! Do you think that this is something that you'll be enabling in your
>> upcoming 1.0.2 release?  We plan on putting out a maintenance Mahout
>> Release relatively soon and this would allow us to speed up Matrix
>> Multiplication greatly.
>>
>> Thanks,
>>
>> Andy
>> ________________________________________
>> From: Till Rohrmann <trohrm...@apache.org>
>> Sent: Tuesday, April 12, 2016 11:18 AM
>> To: dev@flink.apache.org
>> Subject: Re: Kryo StackOverflowError
>>
>> +1
>>
>> On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>> > Good catch Till!
>> >
>> > I just checked it with the Mahout source code and the issues is gone with
>> > reference tracking enabled.
>> >
>> > I would just re-enable it again in Flink.
>> >
>> > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <trohrm...@apache.org>
>> > wrote:
>> >
>> > > Hey guys,
>> > >
>> > > I have a suspicion which could be the culprit: Could change the line
>> > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the
>> error
>> > > still remains? We deactivated the reference tracking and now Kryo
>> > shouldn’t
>> > > be able to resolve cyclic references properly.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > >
>> > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
>> > todd.lison...@intel.com>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I also got this error message when I had private inner classes:
>> > > >
>> > > > public class A {
>> > > >     private class B {
>> > > >     }
>> > > > }
>> > > >
>> > > > I was able to fix by making the inner classes public static:
>> > > >
>> > > > public class A {
>> > > >     public static class B {
>> > > >     }
>> > > > }
>> > > >
>> > > > When I was trying to debug it seemed this error message can be caused
>> > by
>> > > > several different things.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Todd
>> > > >
>> > > >
>> > > > -----Original Message-----
>> > > > From: Hilmi Yildirim [mailto:hilmi.yildi...@dfki.de]
>> > > > Sent: Sunday, April 10, 2016 11:36 AM
>> > > > To: dev@flink.apache.org
>> > > > Subject: Re: Kryo StackOverflowError
>> > > >
>> > > > Hi,
>> > > > I also had this problem and solved it.
>> > > >
>> > > > In my case I had multiple objects which are created via anonymous
>> > > classes.
>> > > > When I broadcasted these objects, the serializer tried to serialize
>> the
>> > > > objects and for that it tried to serialize the anonymous classes.
>> This
>> > > > caused the problem.
>> > > >
>> > > > For example,
>> > > >
>> > > > class A{
>> > > >
>> > > >   def createObjects() : Array[Object]{
>> > > >             objects
>> > > >          for{
>> > > >              object = new Class{
>> > > >              ...
>> > > >              }
>> > > >              objects.add(object)
>> > > >          }
>> > > >          return objects
>> > > >      }
>> > > > }
>> > > >
>> > > > It tried to serialize "new Class". For that it tried to serialize the
>> > > > method createObjects(). And then it tried to serialize class A. To
>> > > > serialize class A it tried to serialize the method createObjects. Or
>> > > > something like that, I do not remember the details. This caused the
>> > > > recursion.
>> > > >
>> > > > BR,
>> > > > Hilmi
>> > > >
>> > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
>> > > > > Hi!
>> > > > >
>> > > > > Is it possible that some datatype has a recursive structure
>> > > nonetheless?
>> > > > > Something like a linked list or so, which would create a large
>> object
>> > > > graph?
>> > > > >
>> > > > > There seems to be a large object graph that the Kryo serializer
>> > > > traverses,
>> > > > > which causes the StackOverflowError.
>> > > > >
>> > > > > Greetings,
>> > > > > Stephan
>> > > > >
>> > > > >
>> > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <
>> ap....@outlook.com>
>> > > > wrote:
>> > > > >
>> > > > >> Hi Stephan,
>> > > > >>
>> > > > >> thanks for answering.
>> > > > >>
>> > > > >> This not from a recursive object. (it is used in a recursive
>> method
>> > in
>> > > > the
>> > > > >> test that is throwing this error, but the the depth is only 2 and
>> > > there
>> > > > are
>> > > > >> no other Flink DataSet operations before execution is triggered so
>> > it
>> > > is
>> > > > >> trivial.)
>> > > > >>
>> > > > >> Gere is a Gist of the code, and the full output and stack trace:
>> > > > >>
>> > > > >>
>> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
>> > > > >>
>> > > > >> The Error begins at line 178 of the "Output" file.
>> > > > >>
>> > > > >> Thanks
>> > > > >>
>> > > > >> ________________________________________
>> > > > >> From: ewenstep...@gmail.com <ewenstep...@gmail.com> on behalf of
>> > > > Stephan
>> > > > >> Ewen <se...@apache.org>
>> > > > >> Sent: Sunday, April 10, 2016 9:39 AM
>> > > > >> To: dev@flink.apache.org
>> > > > >> Subject: Re: Kryo StackOverflowError
>> > > > >>
>> > > > >> Hi!
>> > > > >>
>> > > > >> Sorry, I don't fully understand he diagnosis.
>> > > > >> You say that this stack overflow is not from a recursive/object
>> > type?
>> > > > >>
>> > > > >> Long graphs of operations in Flink usually do not cause
>> > > > >> StackOverflowExceptions, because not the whole graph is
>> recursively
>> > > > >> processed.
>> > > > >>
>> > > > >> Can you paste the entire Stack Trace (for example to a gist)?
>> > > > >>
>> > > > >> Greetings,
>> > > > >> Stephan
>> > > > >>
>> > > > >>
>> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <
>> ap....@outlook.com
>> > >
>> > > > >> wrote:
>> > > > >>
>> > > > >>> Hi all,
>> > > > >>>
>> > > > >>>
>> > > > >>> I am working on a matrix multiplication operation for Mahout
>> Flink
>> > > > >>> Bindings that uses quite a few chained Flink Dataset operations,
>> > > > >>>
>> > > > >>>
>> > > > >>> When testing, I am getting the following error:
>> > > > >>>
>> > > > >>>
>> > > > >>> {...}
>> > > > >>>
>> > > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
>> > > > >>> -> FlatMap (FlatMap at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
>> > > > >>> switched to CANCELED
>> > > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
>> > > > >>> -> GroupCombine (GroupCombine at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
>> > > > >>> -> Combine (Reduce at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
>> > > > >>> switched to FAILED
>> > > > >>> java.lang.StackOverflowError
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>> {...}
>> > > > >>>
>> > > > >>>
>> > > > >>> I've seen similar issues on the dev@flink list (and other
>> places),
>> > > > but I
>> > > > >>> believe that they were from recursive calls and objects which
>> > pointed
>> > > > >> back
>> > > > >>> to themselves somehow.
>> > > > >>>
>> > > > >>>
>> > > > >>> This is a relatively straightforward method, it just has several
>> > > Flink
>> > > > >>> operations before execution is triggered.   If I remove some
>> > > > operations,
>> > > > >>> eg. a reduce, i can get the method to complete on a simple test
>> > > however
>> > > > >> the
>> > > > >>> it will then, of course be numerically incorrect.
>> > > > >>>
>> > > > >>>
>> > > > >>> I am wondering if there is any workaround for this type of
>> problem?
>> > > > >>>
>> > > > >>>
>> > > > >>> Thank You,
>> > > > >>>
>> > > > >>>
>> > > > >>> Andy
>> > > > >>>
>> > > >
>> > > >
>> > > > --
>> > > > ==================================================================
>> > > > Hilmi Yildirim, M.Sc.
>> > > > Researcher
>> > > >
>> > > > DFKI GmbH
>> > > > Intelligente Analytik für Massendaten
>> > > > DFKI Projektbüro Berlin
>> > > > Alt-Moabit 91c
>> > > > D-10559 Berlin
>> > > > Phone: +49 30 23895 1814
>> > > >
>> > > > E-Mail: hilmi.yildi...@dfki.de
>> > > >
>> > > > -------------------------------------------------------------
>> > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>> > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>> > > >
>> > > > Geschaeftsfuehrung:
>> > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> > > > Dr. Walter Olthoff
>> > > >
>> > > > Vorsitzender des Aufsichtsrats:
>> > > > Prof. Dr. h.c. Hans A. Aukes
>> > > >
>> > > > Amtsgericht Kaiserslautern, HRB 2313
>> > > > -------------------------------------------------------------
>> > > >
>> > > >
>> > >
>> >
>>

Reply via email to