Yes. The keys are constantly changing. Indeed each unique event has its own key (the event itself). The purpose was to do an event deduplication ...
> Am 08.09.2015 um 20:05 schrieb Aljoscha Krettek <aljos...@apache.org>: > > Hi Rico, > I have a suspicion. What is the distribution of your keys? That is, are there > many unique keys, do the keys keep evolving, i.e. is it always new and > different keys? > > Cheers, > Aljoscha > >> On Tue, 8 Sep 2015 at 13:44 Rico Bergmann <i...@ricobergmann.de> wrote: >> I also see in the TM overview the CPU load is still around 25% although >> there is no input to the program since minutes. The CPU load is degrading >> very slowly. >> >> The memory consumption is still fluctuating at a high level. It does not >> degrade. >> >> In my test I generated test input for 1 minute. Now 10 minutes are over ... >> >> I think there must be something with flink... >> >> >> >>> Am 08.09.2015 um 13:32 schrieb Rico Bergmann <i...@ricobergmann.de>: >>> >>> The marksweep value is very high, the scavenge very low. If this helps ;-) >>> >>> >>> >>> >>>> Am 08.09.2015 um 11:27 schrieb Robert Metzger <rmetz...@apache.org>: >>>> >>>> It is in the "Information" column: http://i.imgur.com/rzxxURR.png >>>> In the screenshot, the two GCs only spend 84 and 25 ms. >>>> >>>>> On Tue, Sep 8, 2015 at 10:34 AM, Rico Bergmann <i...@ricobergmann.de> >>>>> wrote: >>>>> Where can I find these information? I can see the memory usage and cpu >>>>> load. But where are the information on the GC? >>>>> >>>>> >>>>> >>>>>> Am 08.09.2015 um 09:34 schrieb Robert Metzger <rmetz...@apache.org>: >>>>>> >>>>>> The webinterface of Flink has a tab for the TaskManagers. There, you can >>>>>> also see how much time the JVM spend with garbage collection. >>>>>> Can you check whether the number of GC calls + the time spend goes up >>>>>> after 30 minutes? >>>>>> >>>>>>> On Tue, Sep 8, 2015 at 8:37 AM, Rico Bergmann <i...@ricobergmann.de> >>>>>>> wrote: >>>>>>> Hi! >>>>>>> >>>>>>> I also think it's a GC problem. In the KeySelector I don't instantiate >>>>>>> any object. It's a simple toString method call. >>>>>>> In the mapWindow I create new objects. But I'm doing the same in other >>>>>>> map operators, too. They don't slow down the execution. Only with this >>>>>>> construct the execution is slowed down. >>>>>>> >>>>>>> I watched on the memory footprint of my program. Once with the code >>>>>>> construct I wrote and once without. The memory characteristic were the >>>>>>> same. The CPU usage also ... >>>>>>> >>>>>>> I don't have an explanation. But I don't think it comes from my >>>>>>> operator functions ... >>>>>>> >>>>>>> Cheers Rico. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Am 07.09.2015 um 22:43 schrieb Martin Neumann <mneum...@sics.se>: >>>>>>>> >>>>>>>> Hej, >>>>>>>> >>>>>>>> This sounds like it could be a garbage collection problem. Do you >>>>>>>> instantiate any classes inside any of the operators (e.g. in the >>>>>>>> KeySelector). You can also try to run it locally and use something >>>>>>>> like jstat to rule this out. >>>>>>>> >>>>>>>> cheers Martin >>>>>>>> >>>>>>>>> On Mon, Sep 7, 2015 at 12:00 PM, Rico Bergmann <i...@ricobergmann.de> >>>>>>>>> wrote: >>>>>>>>> Hi! >>>>>>>>> >>>>>>>>> While working with grouping and windowing I encountered a strange >>>>>>>>> behavior. I'm doing: >>>>>>>>>> dataStream.groupBy(KeySelector).window(Time.of(x, >>>>>>>>>> TimeUnit.SECONDS)).mapWindow(toString).flatten() >>>>>>>>> >>>>>>>>> When I run the program containing this snippet it initially outputs >>>>>>>>> data at a rate around 150 events per sec. (That is roughly the input >>>>>>>>> rate for the program). After about 10-30 minutes the rate drops down >>>>>>>>> below 5 events per sec. This leads to event delivery offsets getting >>>>>>>>> bigger and bigger ... >>>>>>>>> >>>>>>>>> Any explanation for this? I know you are reworking the streaming API. >>>>>>>>> But it would be useful to know, why this happens ... >>>>>>>>> >>>>>>>>> Cheers. Rico.