Yes. The keys are constantly changing. Indeed each unique event has its own key 
(the event itself). The purpose was to do an event deduplication ...



> Am 08.09.2015 um 20:05 schrieb Aljoscha Krettek <aljos...@apache.org>:
> 
> Hi Rico,
> I have a suspicion. What is the distribution of your keys? That is, are there 
> many unique keys, do the keys keep evolving, i.e. is it always new and 
> different keys?
> 
> Cheers,
> Aljoscha
> 
>> On Tue, 8 Sep 2015 at 13:44 Rico Bergmann <i...@ricobergmann.de> wrote:
>> I also see in the TM overview the CPU load is still around 25% although 
>> there is no input to the program since minutes. The CPU load is degrading 
>> very slowly. 
>> 
>> The memory consumption is still fluctuating at a high level. It does not 
>> degrade. 
>> 
>> In my test I generated test input for 1 minute. Now 10 minutes are over ... 
>> 
>> I think there must be something with flink...
>> 
>> 
>> 
>>> Am 08.09.2015 um 13:32 schrieb Rico Bergmann <i...@ricobergmann.de>:
>>> 
>>> The marksweep value is very high, the scavenge very low. If this helps ;-)
>>> 
>>> 
>>> 
>>> 
>>>> Am 08.09.2015 um 11:27 schrieb Robert Metzger <rmetz...@apache.org>:
>>>> 
>>>> It is in the "Information" column: http://i.imgur.com/rzxxURR.png
>>>> In the screenshot, the two GCs only spend 84 and 25 ms.
>>>> 
>>>>> On Tue, Sep 8, 2015 at 10:34 AM, Rico Bergmann <i...@ricobergmann.de> 
>>>>> wrote:
>>>>> Where can I find these information? I can see the memory usage and cpu 
>>>>> load. But where are the information on the GC?
>>>>> 
>>>>> 
>>>>> 
>>>>>> Am 08.09.2015 um 09:34 schrieb Robert Metzger <rmetz...@apache.org>:
>>>>>> 
>>>>>> The webinterface of Flink has a tab for the TaskManagers. There, you can 
>>>>>> also see how much time the JVM spend with garbage collection.
>>>>>> Can you check whether the number of GC calls + the time spend goes up 
>>>>>> after 30 minutes?
>>>>>> 
>>>>>>> On Tue, Sep 8, 2015 at 8:37 AM, Rico Bergmann <i...@ricobergmann.de> 
>>>>>>> wrote:
>>>>>>> Hi!
>>>>>>> 
>>>>>>> I also think it's a GC problem. In the KeySelector I don't instantiate 
>>>>>>> any object. It's a simple toString method call. 
>>>>>>> In the mapWindow I create new objects. But I'm doing the same in other 
>>>>>>> map operators, too. They don't slow down the execution. Only with this 
>>>>>>> construct the execution is slowed down. 
>>>>>>> 
>>>>>>> I watched on the memory footprint of my program. Once with the code 
>>>>>>> construct I wrote and once without. The memory characteristic were the 
>>>>>>> same. The CPU usage also ... 
>>>>>>> 
>>>>>>> I don't have an explanation. But I don't think it comes from my 
>>>>>>> operator functions ...
>>>>>>> 
>>>>>>> Cheers Rico. 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Am 07.09.2015 um 22:43 schrieb Martin Neumann <mneum...@sics.se>:
>>>>>>>> 
>>>>>>>> Hej,
>>>>>>>> 
>>>>>>>> This sounds like it could be a garbage collection problem. Do you 
>>>>>>>> instantiate any classes inside any of the operators (e.g. in the 
>>>>>>>> KeySelector). You can also try to run it locally and use something 
>>>>>>>> like jstat to rule this out.
>>>>>>>> 
>>>>>>>> cheers Martin
>>>>>>>> 
>>>>>>>>> On Mon, Sep 7, 2015 at 12:00 PM, Rico Bergmann <i...@ricobergmann.de> 
>>>>>>>>> wrote:
>>>>>>>>> Hi!
>>>>>>>>> 
>>>>>>>>> While working with grouping and windowing I encountered a strange 
>>>>>>>>> behavior. I'm doing:
>>>>>>>>>> dataStream.groupBy(KeySelector).window(Time.of(x, 
>>>>>>>>>> TimeUnit.SECONDS)).mapWindow(toString).flatten()
>>>>>>>>> 
>>>>>>>>> When I run the program containing this snippet it initially outputs 
>>>>>>>>> data at a rate around 150 events per sec. (That is roughly the input 
>>>>>>>>> rate for the program). After about 10-30 minutes the rate drops down 
>>>>>>>>> below 5 events per sec. This leads to event delivery offsets getting 
>>>>>>>>> bigger and bigger ... 
>>>>>>>>> 
>>>>>>>>> Any explanation for this? I know you are reworking the streaming API. 
>>>>>>>>> But it would be useful to know, why this happens ...
>>>>>>>>> 
>>>>>>>>> Cheers. Rico. 

Reply via email to