Thanks for posting this. I think it is not super urgent (in the sense of weeks or few months), so results around mid summer is probably good. The background in LLVM is a very good base for this!
On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth <xazax....@gmail.com> wrote: > Hi, > > In the meantime I sent out the current version of the proposal draft [1]. > Hopefully it will help you triage this task and contribute to the > discussion of the problem. > How urgent is this issue? In what time frame should there be results? > > Best Regards, > Gábor > > [1] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html > > On 9 March 2016 at 14:49, Stephan Ewen <se...@apache.org> wrote: > > > Do we have consensus that we want to "reserve" this topic for a GSoC > > student? > > > > It is becoming a feature that gains more importance. To see we can "hold > > off" on working on that, would be good to know a bit more, like > > - when is it decided whether this project takes place? > > - when would results be there? > > - can we expect the results to be usable, i.e., how good is the > student? > > (no offence, but so far the results in GSoC were everywhere between very > > good and super bad) > > > > Greetings, > > Stephan > > > > > > On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <balassi.mar...@gmail.com > > > > wrote: > > > > > @Fabian: That is my bad, but I think we should be still on time. Pinged > > Uli > > > just to make sure. Proposal from Gabor and Jira from me are coming > soon. > > > > > > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fhue...@gmail.com> > > wrote: > > > > > > > Hi Gabor, > > > > > > > > I did not find any Flink proposals for this year's GSoC in JIRA > (should > > > be > > > > labeled with gsoc2016). > > > > I am also not sure if any of the Flink committers signed up as a GSoC > > > > mentor. > > > > Maybe it is still time to do that but as it looks right now there are > > no > > > > GSoC projects offered by Flink. > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xazax....@gmail.com>: > > > > > > > > > Hi! > > > > > > > > > > I am planning to do GSoC and I would like to work on the > serializers. > > > > More > > > > > specifically I would like to implement code generation. I am > planning > > > to > > > > > send the first draft of the proposal to the mailing list early next > > > week. > > > > > If everything is going well, that will include some preliminary > > > > benchmarks > > > > > how much performance gain can be expected from hand written > > > serializers. > > > > > > > > > > Best regards, > > > > > Gábor > > > > > > > > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote: > > > > > > > > > > > Ah, very good, that makes sense! > > > > > > > > > > > > I would guess that this performance difference could probably be > > seen > > > > at > > > > > > various points where generic serializers and comparators are used > > > (also > > > > > for > > > > > > Comparable, Writable) or > > > > > > where the TupleSerializer delegates to a sequence of other > > > > > TypeSerializers. > > > > > > > > > > > > I guess creating more specialized serializers would solve some of > > > these > > > > > > problems, like in your IntValue vs LongValue case. > > > > > > > > > > > > The best way to solve that would probably be through code > > generation > > > in > > > > > the > > > > > > serializers. That has actually been my wish for quite a while. > > > > > > If you are also into these kinds of low-level performance topics, > > we > > > > > could > > > > > > start a discussion on that. > > > > > > > > > > > > Greetings, > > > > > > Stephan > > > > > > > > > > > > > > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <c...@greghogan.com> > > > > wrote: > > > > > > > > > > > > > The issue is not with the Tuple hierarchy (running Gelly > examples > > > had > > > > > no > > > > > > > effect on runtime, and as you note there aren't any subclass > > > > overrides) > > > > > > but > > > > > > > with CopyableValue. I had been using IntValue exclusively but > had > > > > > > switched > > > > > > > to using LongValue for graph generation. > CopyableValueComparator > > > and > > > > > > > CopyableValueSerializer are now working with multiple types. > > > > > > > > > > > > > > If I create IntValue- and LongValue-specific versions of > > > > > > > CopyableValueComparator and CopyableValueSerializer and modify > > > > > > > ValueTypeInfo to return these then I see the expected > > performance. > > > > > > > > > > > > > > Greg > > > > > > > > > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org > > > > > > wrote: > > > > > > > > > > > > > > > Hi Greg! > > > > > > > > > > > > > > > > Sounds very interesting. > > > > > > > > > > > > > > > > Do you have a hunch what "virtual" Tuple methods are being > used > > > > that > > > > > > > become > > > > > > > > less jit-able? In many cases, tuples use only field accesses > > > (like > > > > > > > > "vakle.f1") in the user functions. > > > > > > > > > > > > > > > > I have to dig into the serializers, to see if they could > suffer > > > > from > > > > > > > that. > > > > > > > > The "getField(pos)" method for example should always have > many > > > > > > overrides > > > > > > > > (though few would be loaded at any time, because one usually > > does > > > > not > > > > > > use > > > > > > > > all Tuple classes at the same time). > > > > > > > > > > > > > > > > Greetings, > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan < > > c...@greghogan.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > I am noticing what looks like the same drop-off in > > performance > > > > when > > > > > > > > > introducing TupleN subclasses as expressed in > "Understanding > > > the > > > > > JIT > > > > > > > and > > > > > > > > > tuning the implementation" [1]. > > > > > > > > > > > > > > > > > > I start my single-node cluster, run an algorithm which > relies > > > > > purely > > > > > > on > > > > > > > > > Tuples, and measure the runtime. I execute a separate jar > > which > > > > > > > executes > > > > > > > > > essentially the same algorithm but using Gelly's Edge > (which > > > > > > subclasses > > > > > > > > > Tuple3 but does not add any extra fields) and now both the > > > Tuple > > > > > and > > > > > > > Edge > > > > > > > > > algorithms take twice as long. > > > > > > > > > > > > > > > > > > Has this been previously discussed? If not I can work up a > > > > > > > demonstration. > > > > > > > > > > > > > > > > > > [1] > > > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html > > > > > > > > > > > > > > > > > > Greg > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >