Hi Gabor, I did not find any Flink proposals for this year's GSoC in JIRA (should be labeled with gsoc2016). I am also not sure if any of the Flink committers signed up as a GSoC mentor. Maybe it is still time to do that but as it looks right now there are no GSoC projects offered by Flink.
Best, Fabian 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xazax....@gmail.com>: > Hi! > > I am planning to do GSoC and I would like to work on the serializers. More > specifically I would like to implement code generation. I am planning to > send the first draft of the proposal to the mailing list early next week. > If everything is going well, that will include some preliminary benchmarks > how much performance gain can be expected from hand written serializers. > > Best regards, > Gábor > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote: > > > Ah, very good, that makes sense! > > > > I would guess that this performance difference could probably be seen at > > various points where generic serializers and comparators are used (also > for > > Comparable, Writable) or > > where the TupleSerializer delegates to a sequence of other > TypeSerializers. > > > > I guess creating more specialized serializers would solve some of these > > problems, like in your IntValue vs LongValue case. > > > > The best way to solve that would probably be through code generation in > the > > serializers. That has actually been my wish for quite a while. > > If you are also into these kinds of low-level performance topics, we > could > > start a discussion on that. > > > > Greetings, > > Stephan > > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <c...@greghogan.com> wrote: > > > > > The issue is not with the Tuple hierarchy (running Gelly examples had > no > > > effect on runtime, and as you note there aren't any subclass overrides) > > but > > > with CopyableValue. I had been using IntValue exclusively but had > > switched > > > to using LongValue for graph generation. CopyableValueComparator and > > > CopyableValueSerializer are now working with multiple types. > > > > > > If I create IntValue- and LongValue-specific versions of > > > CopyableValueComparator and CopyableValueSerializer and modify > > > ValueTypeInfo to return these then I see the expected performance. > > > > > > Greg > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> wrote: > > > > > > > Hi Greg! > > > > > > > > Sounds very interesting. > > > > > > > > Do you have a hunch what "virtual" Tuple methods are being used that > > > become > > > > less jit-able? In many cases, tuples use only field accesses (like > > > > "vakle.f1") in the user functions. > > > > > > > > I have to dig into the serializers, to see if they could suffer from > > > that. > > > > The "getField(pos)" method for example should always have many > > overrides > > > > (though few would be loaded at any time, because one usually does not > > use > > > > all Tuple classes at the same time). > > > > > > > > Greetings, > > > > Stephan > > > > > > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <c...@greghogan.com> > > wrote: > > > > > > > > > I am noticing what looks like the same drop-off in performance when > > > > > introducing TupleN subclasses as expressed in "Understanding the > JIT > > > and > > > > > tuning the implementation" [1]. > > > > > > > > > > I start my single-node cluster, run an algorithm which relies > purely > > on > > > > > Tuples, and measure the runtime. I execute a separate jar which > > > executes > > > > > essentially the same algorithm but using Gelly's Edge (which > > subclasses > > > > > Tuple3 but does not add any extra fields) and now both the Tuple > and > > > Edge > > > > > algorithms take twice as long. > > > > > > > > > > Has this been previously discussed? If not I can work up a > > > demonstration. > > > > > > > > > > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html > > > > > > > > > > Greg > > > > > > > > > > > > > > >