Re: Improve performance of call system.currentTimeMillis()

2015-08-12 Thread Ufuk Celebi
On Wed, Aug 12, 2015 at 3:57 PM, Stephan Ewen wrote: > I curious what the results are! Same here! :-)

Re: [Proposal] Addition to Gelly

2015-08-12 Thread Stephan Ewen
Same here as for Max, I am not familiar enough any more to make really good comments. Some generic comments, though: - Check whether you really need a technique. Techniques that improve corner cases, but make the code much more complex and make the behavior of algorithms less robust are often b

Re: Improve performance of call system.currentTimeMillis()

2015-08-12 Thread Stephan Ewen
@ffbin: Would this be a JVM singleton that updates a static field every millisecond? It is hard to say at this point whether this eliminates a bottleneck, but I think it is fine to try. To evaluate this, you could use a streaming job that attaches a timestamp to every record, and measure the diff

Re: Improve performance of call system.currentTimeMillis()

2015-08-12 Thread Maximilian Michels
I second Ufuk and Chensnay. Please provide us with a benchmark. I have a hard time to believe your implementation, along with the overhead that comes with it, will improve the streaming performance. Please, feel free to prove us wrong :) On Wed, Aug 12, 2015 at 11:48 AM, Chesnay Schepler < chesna

Re: [Proposal] Addition to Gelly

2015-08-12 Thread Maximilian Michels
I think this is a decision to be made by the people involved in the Gelly library. I'm not very familiar with graph processing libraries. Thus, it is hard for me to asses the value of this contribution. However, you outlined pretty well that for highly skewed graphs your technique results in a muc

Re: [Proposal] Addition to Gelly

2015-08-12 Thread Andra Lungu
I would love to get some feedback from the guys at data Artisans about this one. So far, the comments originated and spread in the Stockholm area :) On Tue, Aug 11, 2015 at 6:33 PM, Andra Lungu wrote: > Hi Samia, > > A good method to statistically determine skewed vertices was beyond the > purpo

Re: Improve performance of call system.currentTimeMillis()

2015-08-12 Thread Chesnay Schepler
-.- if you look at into this issue that you opened https://issues.apache.org/jira/browse/FLINK-2471 you were given opinions from 2 separate people, with arguments, that the performance improvement(TBD) is either a) nonexistant or b) negligible. All you did was disregard those basically saying

Re: Improve performance of call system.currentTimeMillis()

2015-08-12 Thread Ufuk Celebi
Hey Fengbin, did you run a program and noticed some kind of performance degradation because of this? If yes, could you provide some details? If not, I would suggest to not do this. I can see how this "improves" performance in micro benchmarks, but not how it would affect the overall performance o

Improve performance of call system.currentTimeMillis()

2015-08-12 Thread Fangfengbin
Hello Some operators call system.currentTimeMillis() frequently and it is cost performance. I want to use a thread to call System.currentTimeMillis and update a long variable millTime. All other module do not need call System.currentTimeMillis() and can get millTime directly. I want to know

Re: nightly builds

2015-08-12 Thread Fabian Hueske
Hi Pieter, there are SNAPSHOT builds on a Apache mirror which are updated after each new commit (if build passes). The repository is located at: https://repository.apache.org/content/repositories/snapshots Cheers, Fabian 2015-08-12 11:23 GMT+02:00 Pieter-Jan Van Aeken < pieterjan.vanae...@eur

nightly builds

2015-08-12 Thread Pieter-Jan Van Aeken
Hello, Is there a nightly snapshot build that is exposed through Maven? I couldn't find one in Maven central, but maybe there is another repo somewhere? Regards, Pieter-Jan

Re: Multiple control flows in a program

2015-08-12 Thread Till Rohrmann
You can take a look at the ALS implementation. There I did something similar. On Wed, Aug 12, 2015 at 10:27 AM, Sachin Goel wrote: > Since the random splits need to be done on any data set a user provides, I > think making a persistent source would be the best solution then. > > > -- Sachin Goel

Re: Multiple control flows in a program

2015-08-12 Thread Sachin Goel
Since the random splits need to be done on any data set a user provides, I think making a persistent source would be the best solution then. -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Aug 12, 2015 at 1:37 PM, Till Rohrmann wrote: > One branch does not occupy a single

Re: Multiple control flows in a program

2015-08-12 Thread Till Rohrmann
One branch does not occupy a single slot. A slot is usually shared by operators from multiple branches. Only subtasks of the same operator cannot be placed into the same slot. Thus, it's not an argument against it. Most if not all input formats assign the input splits on a first comes first serve

Re: Multiple control flows in a program

2015-08-12 Thread Sachin Goel
Hi Till Thanks for the reply. If you think about it however, having several diverging computational paths from an intermediate point will probably require re-computation anyway, in case the number of these paths is even higher than the slots available. Could that be an argument against a possible i

Re: Multiple control flows in a program

2015-08-12 Thread Till Rohrmann
At the moment, Flink does not support the calculation of intermediate results from which you can continue your computation. When you execute jobs which share parts of its job graph, then they are recomputed. When your job contains operators with non-deterministic output, then there is no guarantee