The issue is not with the Tuple hierarchy (running Gelly examples had no
effect on runtime, and as you note there aren't any subclass overrides) but
with CopyableValue. I had been using IntValue exclusively but had switched
to using LongValue for graph generation. CopyableValueComparator and
CopyableValueSerializer are now working with multiple types.

If I create IntValue- and LongValue-specific versions of
CopyableValueComparator and CopyableValueSerializer and modify
ValueTypeInfo to return these then I see the expected performance.

Greg

On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi Greg!
>
> Sounds very interesting.
>
> Do you have a hunch what "virtual" Tuple methods are being used that become
> less jit-able? In many cases, tuples use only field accesses (like
> "vakle.f1") in the user functions.
>
> I have to dig into the serializers, to see if they could suffer from that.
> The "getField(pos)" method for example should always have many overrides
> (though few would be loaded at any time, because one usually does not use
> all Tuple classes at the same time).
>
> Greetings,
> Stephan
>
>
> On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <c...@greghogan.com> wrote:
>
> > I am noticing what looks like the same drop-off in performance when
> > introducing TupleN subclasses as expressed in "Understanding the JIT and
> > tuning the implementation" [1].
> >
> > I start my single-node cluster, run an algorithm which relies purely on
> > Tuples, and measure the runtime. I execute a separate jar which executes
> > essentially the same algorithm but using Gelly's Edge (which subclasses
> > Tuple3 but does not add any extra fields) and now both the Tuple and Edge
> > algorithms take twice as long.
> >
> > Has this been previously discussed? If not I can work up a demonstration.
> >
> > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> >
> > Greg
> >
>

Reply via email to