Very nice proposal!

On Wed, Mar 9, 2016 at 6:35 PM, Stephan Ewen <se...@apache.org> wrote:
> Thanks for posting this.
>
> I think it is not super urgent (in the sense of weeks or few months), so
> results around mid summer is probably good.
> The background in LLVM is a very good base for this!
>
> On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth <xazax....@gmail.com> wrote:
>
>> Hi,
>>
>> In the meantime I sent out the current version of the proposal draft [1].
>> Hopefully it will help you triage this task and contribute to the
>> discussion of the problem.
>> How urgent is this issue? In what time frame should there be results?
>>
>> Best Regards,
>> Gábor
>>
>> [1]
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html
>>
>> On 9 March 2016 at 14:49, Stephan Ewen <se...@apache.org> wrote:
>>
>> > Do we have consensus that we want to "reserve" this topic for a GSoC
>> > student?
>> >
>> > It is becoming a feature that gains more importance. To see we can "hold
>> > off" on working on that, would be good to know a bit more, like
>> >   - when is it decided whether this project takes place?
>> >   - when would results be there?
>> >   - can we expect the results to be usable, i.e., how good is the
>> student?
>> > (no offence, but so far the results in GSoC were everywhere between very
>> > good and super bad)
>> >
>> > Greetings,
>> > Stephan
>> >
>> >
>> > On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <balassi.mar...@gmail.com
>> >
>> > wrote:
>> >
>> > > @Fabian: That is my bad, but I think we should be still on time. Pinged
>> > Uli
>> > > just to make sure. Proposal from Gabor and Jira from me are coming
>> soon.
>> > >
>> > > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <fhue...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Gabor,
>> > > >
>> > > > I did not find any Flink proposals for this year's GSoC in JIRA
>> (should
>> > > be
>> > > > labeled with gsoc2016).
>> > > > I am also not sure if any of the Flink committers signed up as a GSoC
>> > > > mentor.
>> > > > Maybe it is still time to do that but as it looks right now there are
>> > no
>> > > > GSoC projects offered by Flink.
>> > > >
>> > > > Best, Fabian
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <xazax....@gmail.com>:
>> > > >
>> > > > > Hi!
>> > > > >
>> > > > > I am planning to do GSoC and I would like to work on the
>> serializers.
>> > > > More
>> > > > > specifically I would like to implement code generation. I am
>> planning
>> > > to
>> > > > > send the first draft of the proposal to the mailing list early next
>> > > week.
>> > > > > If everything is going well, that will include some preliminary
>> > > > benchmarks
>> > > > > how much performance gain can be expected from hand written
>> > > serializers.
>> > > > >
>> > > > > Best regards,
>> > > > > Gábor
>> > > > >
>> > > > > On 8 March 2016 at 10:47, Stephan Ewen <se...@apache.org> wrote:
>> > > > >
>> > > > > > Ah, very good, that makes sense!
>> > > > > >
>> > > > > > I would guess that this performance difference could probably be
>> > seen
>> > > > at
>> > > > > > various points where generic serializers and comparators are used
>> > > (also
>> > > > > for
>> > > > > > Comparable, Writable) or
>> > > > > > where the TupleSerializer delegates to a sequence of other
>> > > > > TypeSerializers.
>> > > > > >
>> > > > > > I guess creating more specialized serializers would solve some of
>> > > these
>> > > > > > problems, like in your IntValue vs LongValue case.
>> > > > > >
>> > > > > > The best way to solve that would probably be through code
>> > generation
>> > > in
>> > > > > the
>> > > > > > serializers. That has actually been my wish for quite a while.
>> > > > > > If you are also into these kinds of low-level performance topics,
>> > we
>> > > > > could
>> > > > > > start a discussion on that.
>> > > > > >
>> > > > > > Greetings,
>> > > > > > Stephan
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <c...@greghogan.com>
>> > > > wrote:
>> > > > > >
>> > > > > > > The issue is not with the Tuple hierarchy (running Gelly
>> examples
>> > > had
>> > > > > no
>> > > > > > > effect on runtime, and as you note there aren't any subclass
>> > > > overrides)
>> > > > > > but
>> > > > > > > with CopyableValue. I had been using IntValue exclusively but
>> had
>> > > > > > switched
>> > > > > > > to using LongValue for graph generation.
>> CopyableValueComparator
>> > > and
>> > > > > > > CopyableValueSerializer are now working with multiple types.
>> > > > > > >
>> > > > > > > If I create IntValue- and LongValue-specific versions of
>> > > > > > > CopyableValueComparator and CopyableValueSerializer and modify
>> > > > > > > ValueTypeInfo to return these then I see the expected
>> > performance.
>> > > > > > >
>> > > > > > > Greg
>> > > > > > >
>> > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <se...@apache.org
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Greg!
>> > > > > > > >
>> > > > > > > > Sounds very interesting.
>> > > > > > > >
>> > > > > > > > Do you have a hunch what "virtual" Tuple methods are being
>> used
>> > > > that
>> > > > > > > become
>> > > > > > > > less jit-able? In many cases, tuples use only field accesses
>> > > (like
>> > > > > > > > "vakle.f1") in the user functions.
>> > > > > > > >
>> > > > > > > > I have to dig into the serializers, to see if they could
>> suffer
>> > > > from
>> > > > > > > that.
>> > > > > > > > The "getField(pos)" method for example should always have
>> many
>> > > > > > overrides
>> > > > > > > > (though few would be loaded at any time, because one usually
>> > does
>> > > > not
>> > > > > > use
>> > > > > > > > all Tuple classes at the same time).
>> > > > > > > >
>> > > > > > > > Greetings,
>> > > > > > > > Stephan
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <
>> > c...@greghogan.com>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > I am noticing what looks like the same drop-off in
>> > performance
>> > > > when
>> > > > > > > > > introducing TupleN subclasses as expressed in
>> "Understanding
>> > > the
>> > > > > JIT
>> > > > > > > and
>> > > > > > > > > tuning the implementation" [1].
>> > > > > > > > >
>> > > > > > > > > I start my single-node cluster, run an algorithm which
>> relies
>> > > > > purely
>> > > > > > on
>> > > > > > > > > Tuples, and measure the runtime. I execute a separate jar
>> > which
>> > > > > > > executes
>> > > > > > > > > essentially the same algorithm but using Gelly's Edge
>> (which
>> > > > > > subclasses
>> > > > > > > > > Tuple3 but does not add any extra fields) and now both the
>> > > Tuple
>> > > > > and
>> > > > > > > Edge
>> > > > > > > > > algorithms take twice as long.
>> > > > > > > > >
>> > > > > > > > > Has this been previously discussed? If not I can work up a
>> > > > > > > demonstration.
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
>> > > > > > > > >
>> > > > > > > > > Greg
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Reply via email to