Hi Tovi, This might seem a really naive question (and its neither a solution or answer to your question ) but I am trying to understand how latency is viewed. You said you achieved less than 5 ms latency and say for the 99th percentile you achieved 0.3 and 9 ms respectively, what kind of latency is this? specific operator latency? because the end to end latency is around 50ms and 370 ms.
Was just curious how latency is seen from a different perspective, would really help me in my understanding. Thanks a lot, Biplob Thanks & Regards Biplob Biswas On Mon, Oct 30, 2017 at 8:53 AM, Sofer, Tovi <tovi.so...@citi.com> wrote: > Thank you Joshi. > > We are using currently FsStateBackend since in version 1.3 it supports > async snapshots, and no RocksDB. > > > > Does anyone else has feedback on this issues? > > > > *From:* Narendra Joshi [mailto:narendr...@gmail.com] > *Sent:* יום א 29 אוקטובר 2017 12:13 > *To:* Sofer, Tovi [ICG-IT] <ts72...@imceu.eu.ssmb.com> > *Cc:* user <user@flink.apache.org> > *Subject:* Re: state size effects latency > > > > We have also faced similar issues. The only thing that happens in sync > when using async snaphots is getting a persistent point in time picture > which in case of rocksdb backend is making symlinks. That would linearly > increase with number of files to symlink but this should be negligible. We > could not find a satisfying reason for increase in latency with state size. > > Best, > Narendra > > Narendra Joshi > > On 29 Oct 2017 15:04, "Sofer, Tovi" <tovi.so...@citi.com> wrote: > > Hi all, > > > > In our application we have a requirement to very low latency, preferably > less than 5ms. > > We were able to achieve this so far, but when we start increasing the > state size, we see distinctive decrease in latency. > > We have added MinPauseBetweenCheckpoints, and are using async snapshots. > > · Why does state size has such distinctive effect on latency? How > can this effect be minimized? > > · Can the state snapshot be done using separates threads and > resources in order to less effect on stream data handling? > > > > > > Details: > > > > Application configuration: > > env.enableCheckpointing(1000); > > env.getCheckpointConfig().*setMinPauseBetweenCheckpoints*(1000); > > env.setStateBackend(new FsStateBackend(checkpointDirURI, true)); // use > async snapshots > > env.setParallelism (16) ; //running on machine with 40 cores > > > > Results: > > > > A. *When state size is ~20MB got latency of 0.3 ms latency for 99’th > percentile* > > > > *Latency info: *(in nanos) > > 2017-10-26 07:26:55,030 INFO com.citi.artemis.flink.reporters.Log4JReporter > - [Flink-MetricRegistry-1] localhost.taskmanager. > 6afd21aeb9b9bef41a4912b023469497.Flink Streaming > Job.AverageE2ELatencyChecker.0.LatencyHistogram: count:10000 min:31919 > max:13481166 mean:89492.0644 stddev:265876.0259763816 p50:68140.5 > p75:82152.5 p95:146654.0499999999 p98:204671.74 p99:308958.73999999993 > p999:3844154.002999794 > > *State\checkpoint info:* > > > > [image: cid:image001.png@01D350DC.40449520] > > > > > > > > *B.** When state size is ~200MB latency was significantly decreased > to 9 ms latency for 99’th percentile* > > *Latency info: * > > 2017-10-26 07:17:35,289 INFO com.citi.artemis.flink.reporters.Log4JReporter > - [Flink-MetricRegistry-1] localhost.taskmanager. > 05431e7ecab1888b2792265cdc0ddf84.Flink Streaming > Job.AverageE2ELatencyChecker.0.LatencyHistogram: count:10000 min:30186 > max:46236470 mean:322105.7072 stddev:2060373.4782505725 p50:68979.5 > p75:85780.25 p95:219882.69999999914 p98:2360171.4399999934 > p99:9251766.559999945 p999:3.956163987499886E7 > > *State\checkpoint info:* > > > > > > [image: cid:image002.png@01D350DC.40449520] > > > > Thanks and regrdas, > > Tovi > > > >