Re: Performance Issue

Rico Bergmann Thu, 24 Sep 2015 07:58:49 -0700

The test data is generated in a flink program running in a separate jvm. The 
generated data is then written to a Kafka topic from which my programs reads 
the data ...




> Am 24.09.2015 um 14:53 schrieb Aljoscha Krettek <aljos...@apache.org>:
> 
> Hi Rico,
> are you generating the data directly in your flink program or some external 
> queue, such as Kafka?
> 
> Cheers,
> Aljoscha
> 
>> On Thu, 24 Sep 2015 at 13:47 Rico Bergmann <i...@ricobergmann.de> wrote:
>> And as side note:
>> 
>> The problem with duplicates seems also to be solved!
>> 
>> Cheers Rico. 
>> 
>> 
>> 
>>> Am 24.09.2015 um 12:21 schrieb Rico Bergmann <i...@ricobergmann.de>:
>>> 
>>> I took a first glance. 
>>> 
>>> I ran 2 test setups. One with a limited test data generator, the outputs 
>>> around 200 events per second. In this setting the new implementation keeps 
>>> up with the incoming message rate. 
>>> 
>>> The other setup had an unlimited generation (at highest possible rate). 
>>> There the same problem as before can be observed. After 2 minutes runtime 
>>> the output of my program is more than a minute behind ... And increasing 
>>> over time. But I don't know whether this could be a setup problem. I 
>>> noticed the os load of my testsystem was around 90%. So it might be more a 
>>> setup problem ...
>>> 
>>> Thanks for your support so far. 
>>> 
>>> Cheers. Rico. 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> Am 24.09.2015 um 09:33 schrieb Aljoscha Krettek <aljos...@apache.org>:
>>>> 
>>>> Hi Rico,
>>>> you should be able to get it with these steps:
>>>> 
>>>> git clone https://github.com/StephanEwen/incubator-flink.git flink
>>>> cd flink
>>>> git checkout -t origin/windows
>>>> 
>>>> This will get you on Stephan's windowing branch. Then you can do a
>>>> 
>>>> mvn clean install -DskipTests
>>>> 
>>>> to build it.
>>>> 
>>>> I will merge his stuff later today, then you should also be able to use it 
>>>> by running the 0.10-SNAPSHOT version.
>>>> 
>>>> Cheers,
>>>> Aljoscha
>>>> 
>>>> 
>>>>> On Thu, 24 Sep 2015 at 09:11 Rico Bergmann <i...@ricobergmann.de> wrote:
>>>>> Hi!
>>>>> 
>>>>> Sounds great. How can I get the source code before it's merged to the 
>>>>> master branch? Unfortunately I only have 2 days left for trying this out 
>>>>> ...
>>>>> 
>>>>> Greets. Rico. 
>>>>> 
>>>>> 
>>>>> 
>>>>>> Am 24.09.2015 um 00:57 schrieb Stephan Ewen <se...@apache.org>:
>>>>>> 
>>>>>> Hi Rico!
>>>>>> 
>>>>>> We have finished the first part of the Window API reworks. You can find 
>>>>>> the code here: https://github.com/apache/flink/pull/1175
>>>>>> 
>>>>>> It should fix the issues and offer vastly improved performance (up to 
>>>>>> 50x faster). For now, it supports time windows, but we will support the 
>>>>>> other cases in the next days.
>>>>>> 
>>>>>> I'll ping you once it is merged, I'd be curious if it fixes your issue. 
>>>>>> Sorry that you ran into this problem...
>>>>>> 
>>>>>> Greetings,
>>>>>> Stephan
>>>>>> 
>>>>>> 
>>>>>>> On Mon, Sep 7, 2015 at 12:00 PM, Rico Bergmann <i...@ricobergmann.de> 
>>>>>>> wrote:
>>>>>>> Hi!
>>>>>>> 
>>>>>>> While working with grouping and windowing I encountered a strange 
>>>>>>> behavior. I'm doing:
>>>>>>>> dataStream.groupBy(KeySelector).window(Time.of(x, 
>>>>>>>> TimeUnit.SECONDS)).mapWindow(toString).flatten()
>>>>>>> 
>>>>>>> When I run the program containing this snippet it initially outputs 
>>>>>>> data at a rate around 150 events per sec. (That is roughly the input 
>>>>>>> rate for the program). After about 10-30 minutes the rate drops down 
>>>>>>> below 5 events per sec. This leads to event delivery offsets getting 
>>>>>>> bigger and bigger ... 
>>>>>>> 
>>>>>>> Any explanation for this? I know you are reworking the streaming API. 
>>>>>>> But it would be useful to know, why this happens ...
>>>>>>> 
>>>>>>> Cheers. Rico.

Re: Performance Issue

Reply via email to