Re: Optimize exact deduplication for tens of billions data per day

2024-04-11 Thread Péter Váry
Hi Lei, There is an additional overhead when adding new keys to an operator, since Flink needs to maintain the state, timers etc for the individual keys. If you are interested in more details, I suggest to use the FlinkUI and compare the flamegraph for the stages. There you can see the difference

Why RocksDB metrics cache-usage is larger than cache-capacity

2024-04-11 Thread Lei Wang
I enable RocksDB native metrics and do some performance tuning. state.backend.rocksdb.block.cache-size is set to 128m,4 slots for each TaskManager. The observed result for one specific parallel slot: state.backend.rocksdb.metrics.block-cache-capacity is about 14.5M state.backend.rocksdb.metric

Re: How to enable RocksDB native metrics?

2024-04-11 Thread Lei Wang
Thanks very much, it finally works On Thu, Apr 11, 2024 at 8:27 PM Zhanghao Chen wrote: > Add a space between -yD and the param should do the trick. > > Best, > Zhanghao Chen > -- > *From:* Lei Wang > *Sent:* Thursday, April 11, 2024 19:40 > *To:* Zhanghao Chen > *C

Re: How to enable RocksDB native metrics?

2024-04-11 Thread Zhanghao Chen
Add a space between -yD and the param should do the trick. Best, Zhanghao Chen From: Lei Wang Sent: Thursday, April 11, 2024 19:40 To: Zhanghao Chen Cc: Biao Geng ; user Subject: Re: How to enable RocksDB native metrics? Hi Zhanghao, flink run -m yarn-cluster

Re: How to enable RocksDB native metrics?

2024-04-11 Thread Lei Wang
Hi Zhanghao, flink run -m yarn-cluster -ys 4 -ynm EventCleaning_wl -yjm 2G -ytm 16G -yqu default -p 8 -yDstate.backend.latency-track.keyed-state-enabled=true -c com.zkj.task.EventCleaningTask SourceDataCleaning-wl_0410.jar --sourceTopic dwd_audio_record --groupId clean_wl_ --sourceServers x.x.x

Re: How to enable RocksDB native metrics?

2024-04-11 Thread Zhanghao Chen
Hi Lei, You are using an old-styled CLI for YARN jobs where "-yD" instead of "-D" should be used. From: Lei Wang Sent: Thursday, April 11, 2024 12:39 To: Biao Geng Cc: user Subject: Re: How to enable RocksDB native metrics? Hi Biao, I tried, it doesn't work

Understanding event time wrt watermarking strategy in flink

2024-04-11 Thread Sachin Mittal
Hello folks, I have few questions: Say I have a source like this: final DataStream data = env.fromSource( source, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(60)) .withTimestampAssigner((event, timestamp) -> event.timestamp)); My pipeline after