Re: Tuple performance and the curious JIT compiler

2016-03-10 Thread Márton Balassi
If the community can agree that the proposal that Gábor Horváth has suggested is a nice approach and can accept that the results will be coming around mid summer, then I would strongly suggest "reserving" him this topic. His previous experience makes him a strong candidate for the task. To add to

Re: Tuple performance and the curious JIT compiler

2016-03-10 Thread Ufuk Celebi
Very nice proposal! On Wed, Mar 9, 2016 at 6:35 PM, Stephan Ewen wrote: > Thanks for posting this. > > I think it is not super urgent (in the sense of weeks or few months), so > results around mid summer is probably good. > The background in LLVM is a very good base for this! > > On Wed, Mar 9, 2

Re: Tuple performance and the curious JIT compiler

2016-03-09 Thread Stephan Ewen
Thanks for posting this. I think it is not super urgent (in the sense of weeks or few months), so results around mid summer is probably good. The background in LLVM is a very good base for this! On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth wrote: > Hi, > > In the meantime I sent out the curren

Re: Tuple performance and the curious JIT compiler

2016-03-09 Thread Gábor Horváth
Hi, In the meantime I sent out the current version of the proposal draft [1]. Hopefully it will help you triage this task and contribute to the discussion of the problem. How urgent is this issue? In what time frame should there be results? Best Regards, Gábor [1] http://apache-flink-mailing-lis

Re: Tuple performance and the curious JIT compiler

2016-03-09 Thread Stephan Ewen
Do we have consensus that we want to "reserve" this topic for a GSoC student? It is becoming a feature that gains more importance. To see we can "hold off" on working on that, would be good to know a bit more, like - when is it decided whether this project takes place? - when would results be

Re: Tuple performance and the curious JIT compiler

2016-03-08 Thread Márton Balassi
@Fabian: That is my bad, but I think we should be still on time. Pinged Uli just to make sure. Proposal from Gabor and Jira from me are coming soon. On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske wrote: > Hi Gabor, > > I did not find any Flink proposals for this year's GSoC in JIRA (should be >

Re: Tuple performance and the curious JIT compiler

2016-03-08 Thread Fabian Hueske
Hi Gabor, I did not find any Flink proposals for this year's GSoC in JIRA (should be labeled with gsoc2016). I am also not sure if any of the Flink committers signed up as a GSoC mentor. Maybe it is still time to do that but as it looks right now there are no GSoC projects offered by Flink. Best,

Re: Tuple performance and the curious JIT compiler

2016-03-08 Thread Gábor Horváth
Hi! I am planning to do GSoC and I would like to work on the serializers. More specifically I would like to implement code generation. I am planning to send the first draft of the proposal to the mailing list early next week. If everything is going well, that will include some preliminary benchmar

Re: Tuple performance and the curious JIT compiler

2016-03-08 Thread Stephan Ewen
Ah, very good, that makes sense! I would guess that this performance difference could probably be seen at various points where generic serializers and comparators are used (also for Comparable, Writable) or where the TupleSerializer delegates to a sequence of other TypeSerializers. I guess creati

Re: Tuple performance and the curious JIT compiler

2016-03-07 Thread Greg Hogan
The issue is not with the Tuple hierarchy (running Gelly examples had no effect on runtime, and as you note there aren't any subclass overrides) but with CopyableValue. I had been using IntValue exclusively but had switched to using LongValue for graph generation. CopyableValueComparator and Copyab

Re: Tuple performance and the curious JIT compiler

2016-03-07 Thread Stephan Ewen
Hi Greg! Sounds very interesting. Do you have a hunch what "virtual" Tuple methods are being used that become less jit-able? In many cases, tuples use only field accesses (like "vakle.f1") in the user functions. I have to dig into the serializers, to see if they could suffer from that. The "getF

Re: Tuple

2015-08-04 Thread Matthias J. Sax
I set parallelism of map to 4 (and I double checked, that the 4 mappers are running on different machines). Furthermore, fromElements() source has parallelism of 1. Thus, some data is going over the network for sure. On 08/04/2015 02:31 PM, Chesnay Schepler wrote: > i think this job would be chai

Re: Tuple

2015-08-04 Thread Chesnay Schepler
i think this job would be chained completely and never do any serialization. On 04.08.2015 14:25, Matthias J. Sax wrote: Works for batch job, too. See enclosed. On 08/04/2015 01:34 PM, Matthias J. Sax wrote: Yes, that is was the program does. However, streaming is not lazy so deserialization s

Re: Tuple

2015-08-04 Thread Matthias J. Sax
Works for batch job, too. See enclosed. On 08/04/2015 01:34 PM, Matthias J. Sax wrote: > Yes, that is was the program does. However, streaming is not lazy so > deserialization should have happened. > > I will try a batch job, later today. > > On 08/04/2015 01:27 PM, Chesnay Schepler wrote: >> so

Re: Tuple

2015-08-04 Thread Matthias J. Sax
Yes, that is was the program does. However, streaming is not lazy so deserialization should have happened. I will try a batch job, later today. On 08/04/2015 01:27 PM, Chesnay Schepler wrote: > so I'm not to much into the streaming API, but as i see it this program > creates an infinite number of

Re: Tuple

2015-08-04 Thread Matthias J. Sax
Yes, that is was the program does. However, streaming is not lazy so deserialization should have happened. I will try a batch job, later today. On 08/04/2015 01:27 PM, Chesnay Schepler wrote: > so I'm not to much into the streaming API, but as i see it this program > creates an infinite number of

Re: Tuple

2015-08-04 Thread Matthias J. Sax
Yes, that is was the program does. However, streaming is not lazy so deserialization should have happened. I will try a batch job, later today. On 08/04/2015 01:27 PM, Chesnay Schepler wrote: > so I'm not to much into the streaming API, but as i see it this program > creates an infinite number of

Re: Tuple

2015-08-04 Thread Aljoscha Krettek
I think in the Streaming Case it works because every Serializer ends up being wrapped up in a StreamRecordSerializer. When the StreamRecordSerializer serializes/deserializes stuff it should be ok that the Tuple0 doesn't actually serialize/deserialize anything. On Tue, 4 Aug 2015 at 13:27 Chesnay S

Re: Tuple

2015-08-04 Thread Chesnay Schepler
so I'm not to much into the streaming API, but as i see it this program creates an infinite number of tuples and then counts them, right? The problem with serialization as i understand it is that the receiver can't tell how many Tuple0 are sent, since you never actually read any data when dese

Re: Tuple

2015-08-04 Thread Matthias J. Sax
Hi, I just opened a PR for this. https://github.com/apache/flink/pull/983 However, I was not able to "reproduce" serialization issues... I tested Tuple0 (see enclosed code) in a cluster, and the program worked. Do I miss anything? -Matthias On 08/03/2015 01:01 AM, Matthias J. Sax wrote: > Tha

Re: Tuple

2015-08-02 Thread Stephan Ewen
The idea of the dedicated project was to make the tuples usable in other programs, that may interact with Flink, but won't want the full dependencies. I share the concern about too many small projects... On Mon, Aug 3, 2015 at 1:01 AM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Th

Re: Tuple

2015-08-02 Thread Matthias J. Sax
Thanks for the advice about Tuple0. I personally don't see any advantage in having "flink-tuple" project. Do I miss anything about it? Furthermore, I am not sure if it is a good idea the have too many too small projects. On 08/03/2015 12:48 AM, Stephan Ewen wrote: > Tuple0 would need special ser

Re: Tuple

2015-08-02 Thread Stephan Ewen
Tuple0 would need special serialization and comparator logic. If that is given, I see no reason not to support it. There is BTW, the request to create a dedicated "flink-tuple" project, that only contains the tuple classes. Any opinions on that? On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <

Re: Tuple

2015-08-02 Thread Matthias J. Sax
Thanks for the explanation! As I mentioned before, Tuple0 might also be helpful for streaming. And I guess I will need it for Storm compatibility layer, too. (I need to double check, but Storm supports zero-attribute-tuples, too). With regard to the information I collected during the discussion,

Re: Tuple

2015-08-02 Thread Chesnay Schepler
First of all, it was a really good idea to start a discussion about this. So the general idea behind Tuple0 was this: The Python API maps python tuples to flink tuples. Python can have empty tuples, so i thought "well duh, let's make a Tuple0 class!". What i did not wanna do is create some non

Re: Tuple

2015-08-02 Thread Matthias J. Sax
Can you elaborate how and why Python used Tuple0? If it cannot be serialized similar to regular Tuples, what is the usage in Python? Right now it seems, as there is no special serialization code for Tuple0. I just want to understand the topic in detail. -Matthias On 08/01/2015 03:38 PM, Stephan

Re: Tuple

2015-08-01 Thread Stephan Ewen
I think a Tuple0 cannot be implemented like the current tuples, at least with respect to runtime serialization. The system makes the assumption that it makes progress in consuming bytes when deserializing values. If a Tuple= never consumes data from the byte stream, this assumption is broken. It w

Re: Tuple

2015-08-01 Thread Matthias J. Sax
I just double checked. Scala does not have type Tuple0. IMHO, it would be best to remove Tuple0 for consistency. Having Tuple types is for consistency reason with Scala in the first place, right? Please give feedback. -Matthias On 08/01/2015 01:04 PM, Matthias J. Sax wrote: > I see. > > I think

Re: Tuple

2015-08-01 Thread Chesnay Schepler
yes, if it is present in the core flink files it must work just as any tuple in flink. removing is not an option though; but moving is. The Python API uses it (that's the reason Tuple0 was added in the first place). On 01.08.2015 13:04, Matthias J. Sax wrote: I see. I think that it might be

Re: Tuple

2015-08-01 Thread Matthias J. Sax
I see. I think that it might be useful to have Tuple0, because in rare cases, you only want to "notify" a downstream operators (taking about streaming) that something happened but there is no actual data to be processed. Furthermore, if Flink cannot deal with Tuple0 it should be removed completely

Re: Tuple

2015-07-31 Thread Chesnay Schepler
also, I'm not sure if I ever sent a Tuple0 through a program, it could be that the system freaks out. On 31.07.2015 22:40, Chesnay Schepler wrote: there's no specific reason. it was added fairly recently by me (mid of april), and you're most likely the second person to use it. i didn't integr

Re: Tuple

2015-07-31 Thread Chesnay Schepler
there's no specific reason. it was added fairly recently by me (mid of april), and you're most likely the second person to use it. i didn't integrate into all our tuple related stuff because, well, i never thought anyone would actually need it, so i saved myself the trouble. Hi, is there an

Re: Tuple project method

2015-05-27 Thread Stephan Ewen
It would be an interesting addition. Such a method cannot be done fully type safe in Java, but that might be okay, since it is user-code internal. On Wed, May 27, 2015 at 11:52 AM, Flavio Pompermaier wrote: > Sorry, to be effective the project should also take in input the target > tuple itself

Re: Tuple project method

2015-05-27 Thread Flavio Pompermaier
Sorry, to be effective the project should also take in input the target tuple itself :) Tuple3 reuse = tuple.project(reuse, 0,2,5)? On Wed, May 27, 2015 at 11:51 AM, Flavio Pompermaier wrote: > Hi flinkers, > > it happens very often to me that I have to output a reuse tuple that > basically is