Hi Nick Sorry for the late jump in.
Just wondering why you call putAll of RocksDBMapState and has RocksDBMapState#clear() followed. seems the state will always be empty after the process. Best, Congxian Yun Tang <[email protected]> 于2020年6月16日周二 下午7:42写道: > Hi Nick > > From my experience, it's not easy to tune this without code to reproduce. > Could you please give code with fake source to reproduce so that we could > help you? > > If CPU usage is 100% at rocksDB related methods, it's might be due to we > access RocksDB too often . If the CPU usage is not 100% while disk util is > 100%, it should be > we meet the performance limit of disk. > > BTW, if you have 16GB memory TM with 32 slots, it would only give about > 150MB managed memory [1][2] for RocksDB, which looks like a bit small. > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/memory/mem_setup.html#managed-memory > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/memory/mem_tuning.html#rocksdb-state-backend > > Best > Yun Tang > > > ------------------------------ > *From:* nick toker <[email protected]> > *Sent:* Tuesday, June 16, 2020 18:36 > *To:* Yun Tang <[email protected]> > *Cc:* [email protected] <[email protected]> > *Subject:* Re: MapState bad performance > > Hi, > > We are using flink version 1.10.1 > The task manager memory 16GB > The number of slots is 32 but the job parallelism is 1. > We used the default configuration for rocksdb. > We checked the disk speed on the machine running the task manager: Write > 300MB and read 1GB > > BR, > Nick > > בתאריך יום ג׳, 16 ביוני 2020 ב-12:12 מאת Yun Tang <[email protected] > >: > > Hi Nick > > As you might know, RocksDB suffers not so good performance for > iterator-like operations due to it needs to merge sort for multi levels. [1] > > Unfortunately, rocksDBMapState.isEmpty() needs to call iterator and seek > operations over rocksDB [2], and rocksDBMapState.clear() needs to iterator > over state and remove entry [3]. > However, even these operations behaves not so good, I don't think they > would behave extremely bad in general case. From our experience on SSD, the > latency of seek should be less than 100us > and could go up to hundreds of us, did you use SSD disk? > > 1. What is the Flink version, taskmanager memory, number of slots and > RocksDB related configurations? > 2. Have you checked the IOPS, disk util for those machines which > containing task manager running RocksDB? > > > [1] https://github.com/facebook/rocksdb/wiki/Iterator-Implementation > [2] > https://github.com/apache/flink/blob/efd497410ced3386b955a92b731a8e758223045f/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java#L241 > [3] > https://github.com/apache/flink/blob/efd497410ced3386b955a92b731a8e758223045f/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java#L254 > > Best > Yun Tang > > ------------------------------ > *From:* nick toker <[email protected]> > *Sent:* Tuesday, June 16, 2020 15:35 > *To:* [email protected] <[email protected]> > *Subject:* MapState bad performance > > Hello, > > We wrote a very simple streaming pipeline containing: > 1. Kafka consumer > 2. Process function > 3. Kafka producer > > The code of the process function is listed below: > > private transient MapState<String, Object> testMapState; > > @Override > public void processElement(Map<String, Object> value, Context ctx, > Collector<Map<String, Object>> out) throws Exception { > > if (testMapState.isEmpty()) { > > testMapState.putAll(value); > > out.collect(value); > > testMapState.clear(); > } > } > > We faced very bad performance and then we made some tests using jprofiler. > Using jprofiler, we saw that the hot spots are 2 functions of the MapState: > 1. isEmpty() - around 7 ms > 2. clear() - around 4 ms > > We had to change and use ValueState instead. > > Are we using the MapState in the correct way or are we doing something > wrong ? > Is this behaviour expected because flink recommendations are to use > MapState and NOT ValueState ? > > BR, > Nick > >
