On Wed, Aug 12, 2015 at 3:57 PM, Stephan Ewen wrote:
> I curious what the results are!
Same here! :-)
Same here as for Max, I am not familiar enough any more to make really good
comments.
Some generic comments, though:
- Check whether you really need a technique. Techniques that improve
corner cases, but make the code much more complex and make the behavior of
algorithms less robust are often b
@ffbin: Would this be a JVM singleton that updates a static field every
millisecond?
It is hard to say at this point whether this eliminates a bottleneck, but I
think it is fine to try.
To evaluate this, you could use a streaming job that attaches a timestamp
to every record, and measure the diff
I second Ufuk and Chensnay. Please provide us with a benchmark. I have a
hard time to believe your implementation, along with the overhead that
comes with it, will improve the streaming performance.
Please, feel free to prove us wrong :)
On Wed, Aug 12, 2015 at 11:48 AM, Chesnay Schepler <
chesna
I think this is a decision to be made by the people involved in the Gelly
library. I'm not very familiar with graph processing libraries. Thus, it is
hard for me to asses the value of this contribution.
However, you outlined pretty well that for highly skewed graphs your
technique results in a muc
I would love to get some feedback from the guys at data Artisans about this
one.
So far, the comments originated and spread in the Stockholm area :)
On Tue, Aug 11, 2015 at 6:33 PM, Andra Lungu wrote:
> Hi Samia,
>
> A good method to statistically determine skewed vertices was beyond the
> purpo
-.-
if you look at into this issue that you opened
https://issues.apache.org/jira/browse/FLINK-2471 you were given opinions
from 2 separate people, with arguments, that the performance
improvement(TBD) is either a) nonexistant or b) negligible. All you did
was disregard those basically saying
Hey Fengbin,
did you run a program and noticed some kind of performance degradation
because of this?
If yes, could you provide some details? If not, I would suggest to not do
this. I can see how this "improves" performance in micro benchmarks, but
not how it would affect the overall performance o
Hello
Some operators call system.currentTimeMillis() frequently and it is cost
performance.
I want to use a thread to call System.currentTimeMillis and update a long
variable millTime. All other module do not need call System.currentTimeMillis()
and can get millTime directly.
I want to know
Hi Pieter,
there are SNAPSHOT builds on a Apache mirror which are updated after each
new commit (if build passes).
The repository is located at:
https://repository.apache.org/content/repositories/snapshots
Cheers, Fabian
2015-08-12 11:23 GMT+02:00 Pieter-Jan Van Aeken <
pieterjan.vanae...@eur
Hello,
Is there a nightly snapshot build that is exposed through Maven? I couldn't
find one in Maven central, but maybe there is another repo somewhere?
Regards,
Pieter-Jan
You can take a look at the ALS implementation. There I did something
similar.
On Wed, Aug 12, 2015 at 10:27 AM, Sachin Goel
wrote:
> Since the random splits need to be done on any data set a user provides, I
> think making a persistent source would be the best solution then.
>
>
> -- Sachin Goel
Since the random splits need to be done on any data set a user provides, I
think making a persistent source would be the best solution then.
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Wed, Aug 12, 2015 at 1:37 PM, Till Rohrmann
wrote:
> One branch does not occupy a single
One branch does not occupy a single slot. A slot is usually shared by
operators from multiple branches. Only subtasks of the same operator cannot
be placed into the same slot. Thus, it's not an argument against it.
Most if not all input formats assign the input splits on a first comes
first serve
Hi Till
Thanks for the reply.
If you think about it however, having several diverging computational paths
from an intermediate point will probably require re-computation anyway, in
case the number of these paths is even higher than the slots available.
Could that be an argument against a possible i
At the moment, Flink does not support the calculation of intermediate
results from which you can continue your computation. When you execute jobs
which share parts of its job graph, then they are recomputed. When your job
contains operators with non-deterministic output, then there is no
guarantee
16 matches
Mail list logo