Re: Performance Issue

2015-09-07 Thread Rico Bergmann
Hi! I also think it's a GC problem. In the KeySelector I don't instantiate any object. It's a simple toString method call. In the mapWindow I create new objects. But I'm doing the same in other map operators, too. They don't slow down the execution. Only with this construct the execution is sl

Question about exactly-once

2015-09-07 Thread Zhangrucong
Dear Sir: I am a beginner of Flink and very interested in “Exactly-once” Recovery Mechanism. I have a question about processing sequence problem of tuples. For example, in Fig 1, process unit A runs JOIN, and the size of sliding window is 4. At the beginning, the state of sliding windows is show

Re: Performance Issue

2015-09-07 Thread Martin Neumann
Hej, This sounds like it could be a garbage collection problem. Do you instantiate any classes inside any of the operators (e.g. in the KeySelector). You can also try to run it locally and use something like jstat to rule this out. cheers Martin On Mon, Sep 7, 2015 at 12:00 PM, Rico Bergmann wr

Re: Union/append performance question

2015-09-07 Thread Fabian Hueske
In that case you should go with union. 2015-09-07 19:06 GMT+02:00 Flavio Pompermaier : > 3 or 4 usually.. > On 7 Sep 2015 18:39, "Fabian Hueske" wrote: > >> And how many unions would your program use if you would follow the >> union-in-loop approach? >> >> 2015-09-07 18:31 GMT+02:00 Flavio Pompe

Re: Union/append performance question

2015-09-07 Thread Flavio Pompermaier
3 or 4 usually.. On 7 Sep 2015 18:39, "Fabian Hueske" wrote: > And how many unions would your program use if you would follow the > union-in-loop approach? > > 2015-09-07 18:31 GMT+02:00 Flavio Pompermaier : > >> In the order of 10 GB.. >> >> On Mon, Sep 7, 2015 at 6:14 PM, Fabian Hueske wrote:

Volunteers needed for Flink Forward 2015 (and they get a free ticket)

2015-09-07 Thread Kostas Tzoumas
Hi folks, The Flink Forward 2015 organizers are looking for volunteers (and they are offering free tickets in exchange). Sign up here if you are interested (or send me an email): http://flink-forward.org/?page_id=495 Best, Kostas

Re: Union/append performance question

2015-09-07 Thread Fabian Hueske
And how many unions would your program use if you would follow the union-in-loop approach? 2015-09-07 18:31 GMT+02:00 Flavio Pompermaier : > In the order of 10 GB.. > > On Mon, Sep 7, 2015 at 6:14 PM, Fabian Hueske wrote: > >> Accumulators can be used to collect records, but they are not designe

Re: Union/append performance question

2015-09-07 Thread Flavio Pompermaier
In the order of 10 GB.. On Mon, Sep 7, 2015 at 6:14 PM, Fabian Hueske wrote: > Accumulators can be used to collect records, but they are not designed to > hold large amounts of data. > It might work up to a certain point (~10MB) and fail beyond that. > > How many unions do you plan to use in you

Re: Union/append performance question

2015-09-07 Thread Fabian Hueske
Accumulators can be used to collect records, but they are not designed to hold large amounts of data. It might work up to a certain point (~10MB) and fail beyond that. How many unions do you plan to use in your program? 2015-09-07 17:58 GMT+02:00 Flavio Pompermaier : > ok thanks. are there any

Re: Union/append performance question

2015-09-07 Thread Flavio Pompermaier
ok thanks. are there any alternatives to that?may I use accumulators for that? On 7 Sep 2015 17:47, "Fabian Hueske" wrote: > If the loop count of 3 is fixed (or not significantly larger), union > should be fine. > > 2015-09-07 17:07 GMT+02:00 Flavio Pompermaier : > >> Sorry the program has a unio

Re: Union/append performance question

2015-09-07 Thread Fabian Hueske
If the loop count of 3 is fixed (or not significantly larger), union should be fine. 2015-09-07 17:07 GMT+02:00 Flavio Pompermaier : > Sorry the program has a union at accumulated = > accumulated.union(x.filter(t.f1 > == 0)) > > On Mon, Sep 7, 2015 at 4:58 PM, Fabian Hueske wrote: > >> Hi Fla

Re: Union/append performance question

2015-09-07 Thread Flavio Pompermaier
Sorry the program has a union at accumulated = accumulated.union(x.filter(t.f1 == 0)) On Mon, Sep 7, 2015 at 4:58 PM, Fabian Hueske wrote: > Hi Flavio, > > your example does not contain a union. > > Union itself basically comes for free. However, if you have a lot of small > DataSet that you w

Re: Union/append performance question

2015-09-07 Thread Fabian Hueske
Hi Flavio, your example does not contain a union. Union itself basically comes for free. However, if you have a lot of small DataSet that you want to union, the plan can become very complex and might cause overhead due to scheduling many small tasks. For example, it is usually better to have one

Re: Union/append performance question

2015-09-07 Thread Flavio Pompermaier
Hi Stephan, thanks for the answer. Unfortunately I dind't understand if there's an alternative to union right now.. My process is basically like this: Dataset x = ... while(loopCnt < 3){ x = x.join(y).where(0).equalTo(0).with()); accumulated = x.filter(t.f1 == 0); x = x.filter(t.f1!=0);

Re: How to create a stream of data batches

2015-09-07 Thread Juan Rodríguez Hortalá
Hi, I'm just a Flink newbie, but maybe I'd suggest using window operators with a Count policy for that https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#window-operators Hope that helps. Greetings, Juan 2015-09-04 14:14 GMT+02:00 Stephan Ewen : > Interest

Re: Union/append performance question

2015-09-07 Thread Stephan Ewen
Union, like all operators, is lazy. When you call union, it only builds a "union stream", that unions when you execute the task. So nothing is added before you call "env.execute()" After you call "env.execute()" and then union again, you will re-execute the entire history of computation to compute

Union/append performance question

2015-09-07 Thread Flavio Pompermaier
Hi to all, I have a job where I have to incrementally add Tuples to a dataset (in a while loop). Is union() the best operator for this task or is there a more performant operator for this task? Does union affect the read of already existing elements or it just appends the new ones somewhere? Best,

Performance Issue

2015-09-07 Thread Rico Bergmann
Hi! While working with grouping and windowing I encountered a strange behavior. I'm doing: > dataStream.groupBy(KeySelector).window(Time.of(x, > TimeUnit.SECONDS)).mapWindow(toString).flatten() When I run the program containing this snippet it initially outputs data at a rate around 150 events

Re: NPE thrown when using Storm Kafka Spout with Flink

2015-09-07 Thread Robert Metzger
Hi Jerry, the issue occurs because Flink's storm compatibility layer does not support custom configuration parameters currently. There is this JIRA which aims to add the missing feature to Flink: https://issues.apache.org/jira/browse/FLINK-2525 Maybe (but its unlikely) passing an empty Map in the

Re: Memory management issue

2015-09-07 Thread Stephan Ewen
Hi! Can you switch to version 0.9.1? That one included some bug fixes, including one or two possible deadlock situations. Please let us know if that solves the issue, or if the issue persists... Greetings, Stephan On Fri, Sep 4, 2015 at 7:19 PM, Ricarda Schueler < ricarda.schue...@student.hpi.