Hi Gabor,

I did not find any Flink proposals for this year's GSoC in JIRA (should be
labeled with gsoc2016).
I am also not sure if any of the Flink committers signed up as a GSoC
mentor.
Maybe it is still time to do that but as it looks right now there are no
GSoC projects offered by Flink.

Best, Fabian






2016-03-08 11:22 GMT+01:00 Gábor Horváth <xazax....@gmail.com>:

> Hi!
>
> I am planning to do GSoC and I would like to work on the serializers. More
> specifically I would like to implement code generation. I am planning to
> send the first draft of the proposal to the mailing list early next week.
> If everything is going well, that will include some preliminary benchmarks
> how much performance gain can be expected from hand written serializers.
>
> Best regards,
> Gábor
>
> On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
>
> > Ah, very good, that makes sense!
> >
> > I would guess that this performance difference could probably be seen at
> > various points where generic serializers and comparators are used (also
> for
> > Comparable, Writable) or
> > where the TupleSerializer delegates to a sequence of other
> TypeSerializers.
> >
> > I guess creating more specialized serializers would solve some of these
> > problems, like in your IntValue vs LongValue case.
> >
> > The best way to solve that would probably be through code generation in
> the
> > serializers. That has actually been my wish for quite a while.
> > If you are also into these kinds of low-level performance topics, we
> could
> > start a discussion on that.
> >
> > Greetings,
> > Stephan
> >
> >
> > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <c...@greghogan.com> wrote:
> >
> > > The issue is not with the Tuple hierarchy (running Gelly examples had
> no
> > > effect on runtime, and as you note there aren't any subclass overrides)
> > but
> > > with CopyableValue. I had been using IntValue exclusively but had
> > switched
> > > to using LongValue for graph generation. CopyableValueComparator and
> > > CopyableValueSerializer are now working with multiple types.
> > >
> > > If I create IntValue- and LongValue-specific versions of
> > > CopyableValueComparator and CopyableValueSerializer and modify
> > > ValueTypeInfo to return these then I see the expected performance.
> > >
> > > Greg
> > >
> > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Hi Greg!
> > > >
> > > > Sounds very interesting.
> > > >
> > > > Do you have a hunch what "virtual" Tuple methods are being used that
> > > become
> > > > less jit-able? In many cases, tuples use only field accesses (like
> > > > "vakle.f1") in the user functions.
> > > >
> > > > I have to dig into the serializers, to see if they could suffer from
> > > that.
> > > > The "getField(pos)" method for example should always have many
> > overrides
> > > > (though few would be loaded at any time, because one usually does not
> > use
> > > > all Tuple classes at the same time).
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <c...@greghogan.com>
> > wrote:
> > > >
> > > > > I am noticing what looks like the same drop-off in performance when
> > > > > introducing TupleN subclasses as expressed in "Understanding the
> JIT
> > > and
> > > > > tuning the implementation" [1].
> > > > >
> > > > > I start my single-node cluster, run an algorithm which relies
> purely
> > on
> > > > > Tuples, and measure the runtime. I execute a separate jar which
> > > executes
> > > > > essentially the same algorithm but using Gelly's Edge (which
> > subclasses
> > > > > Tuple3 but does not add any extra fields) and now both the Tuple
> and
> > > Edge
> > > > > algorithms take twice as long.
> > > > >
> > > > > Has this been previously discussed? If not I can work up a
> > > demonstration.
> > > > >
> > > > > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > >
> > > > > Greg
> > > > >
> > > >
> > >
> >
>

Reply via email to