@Fabian: That is my bad, but I think we should be still on time. Pinged Uli just to make sure. Proposal from Gabor and Jira from me are coming soon.
On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Gabor, > > I did not find any Flink proposals for this year's GSoC in JIRA (should be > labeled with gsoc2016). > I am also not sure if any of the Flink committers signed up as a GSoC > mentor. > Maybe it is still time to do that but as it looks right now there are no > GSoC projects offered by Flink. > > Best, Fabian > > > > > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xazax....@gmail.com>: > > > Hi! > > > > I am planning to do GSoC and I would like to work on the serializers. > More > > specifically I would like to implement code generation. I am planning to > > send the first draft of the proposal to the mailing list early next week. > > If everything is going well, that will include some preliminary > benchmarks > > how much performance gain can be expected from hand written serializers. > > > > Best regards, > > Gábor > > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote: > > > > > Ah, very good, that makes sense! > > > > > > I would guess that this performance difference could probably be seen > at > > > various points where generic serializers and comparators are used (also > > for > > > Comparable, Writable) or > > > where the TupleSerializer delegates to a sequence of other > > TypeSerializers. > > > > > > I guess creating more specialized serializers would solve some of these > > > problems, like in your IntValue vs LongValue case. > > > > > > The best way to solve that would probably be through code generation in > > the > > > serializers. That has actually been my wish for quite a while. > > > If you are also into these kinds of low-level performance topics, we > > could > > > start a discussion on that. > > > > > > Greetings, > > > Stephan > > > > > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <c...@greghogan.com> > wrote: > > > > > > > The issue is not with the Tuple hierarchy (running Gelly examples had > > no > > > > effect on runtime, and as you note there aren't any subclass > overrides) > > > but > > > > with CopyableValue. I had been using IntValue exclusively but had > > > switched > > > > to using LongValue for graph generation. CopyableValueComparator and > > > > CopyableValueSerializer are now working with multiple types. > > > > > > > > If I create IntValue- and LongValue-specific versions of > > > > CopyableValueComparator and CopyableValueSerializer and modify > > > > ValueTypeInfo to return these then I see the expected performance. > > > > > > > > Greg > > > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org> > wrote: > > > > > > > > > Hi Greg! > > > > > > > > > > Sounds very interesting. > > > > > > > > > > Do you have a hunch what "virtual" Tuple methods are being used > that > > > > become > > > > > less jit-able? In many cases, tuples use only field accesses (like > > > > > "vakle.f1") in the user functions. > > > > > > > > > > I have to dig into the serializers, to see if they could suffer > from > > > > that. > > > > > The "getField(pos)" method for example should always have many > > > overrides > > > > > (though few would be loaded at any time, because one usually does > not > > > use > > > > > all Tuple classes at the same time). > > > > > > > > > > Greetings, > > > > > Stephan > > > > > > > > > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <c...@greghogan.com> > > > wrote: > > > > > > > > > > > I am noticing what looks like the same drop-off in performance > when > > > > > > introducing TupleN subclasses as expressed in "Understanding the > > JIT > > > > and > > > > > > tuning the implementation" [1]. > > > > > > > > > > > > I start my single-node cluster, run an algorithm which relies > > purely > > > on > > > > > > Tuples, and measure the runtime. I execute a separate jar which > > > > executes > > > > > > essentially the same algorithm but using Gelly's Edge (which > > > subclasses > > > > > > Tuple3 but does not add any extra fields) and now both the Tuple > > and > > > > Edge > > > > > > algorithms take twice as long. > > > > > > > > > > > > Has this been previously discussed? If not I can work up a > > > > demonstration. > > > > > > > > > > > > [1] > https://flink.apache.org/news/2015/09/16/off-heap-memory.html > > > > > > > > > > > > Greg > > > > > > > > > > > > > > > > > > > > >