My thought would be to compare the data rate and buffer sizes which gives a refresh interval. For example, if you are transmitting 1 GB/s on 128 MiB of network buffers then the refresh rate is at most 1/8 second. There is the same consideration with spill files if the system does not have sufficient free memory for a large number of readahead buffers. Another set of buffers are the kernel socket buffers and you can increase from the Linux default 4 MiB by changing "taskmanager.net.sendReceiveBufferSize" (documentation is in progress; see org.apache.flink.runtime.io.network.netty.NettyConfig).
Your nodes have 100+ GB of memory so a conservative assignment might be a gigabyte of network buffers. Then add the following to the conf, restart the cluster, start jconsole on a TaskManager, connect to the TaskManager process, and on the MBeans tab look under org.apache.flink.metrics for Network.AvailableMemorySegments. metrics.reporters: my_jmx_reporter metrics.reporter.my_jmx_reporter.class: org.apache.flink.metrics.jmx.JMXReporter metrics.reporter.my_jmx_reporter.port: 9020-9040 On Mon, Sep 19, 2016 at 3:54 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I > followed that formula :-)))I can bump it up to twice as much like what the > example is doing to for instance 300 MiB.Is this reasonable? what do you > suggest as a reasonable range?Thanks Greg > > From: Greg Hogan <c...@greghogan.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Monday, September 19, 2016 12:43 PM > Subject: Re: Performance and Latency Chart for Flink > > You will need to add the configuration parameters to your flink-conf.yaml. > I believe the intent is that all configuration parameters should be listed > at > > https://ci.apache.org/projects/flink/flink-docs- > master/setup/config.html#full-reference > > My understanding is that the Flink buffers are currently copied to Netty > buffers, although I don't understand the stated memory doubling. > > > On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari < > amirto...@yahoo.com.invalid> wrote: > > > Hi Greg,In the same Flink config link below, there are parameters that > > dont even exist in flink-conf.yaml.Are they defined somewhere else?I > > grepped the followings & none existed in any of the files under conf > > folder."taskmanager.memory.fraction", taskmanager.memory.off > > -heap, taskmanager.memory.segment-size & many more. > > Also, isnt the example calculating the network buffers wrong? Based on > the > > example, roughly 5000 buffers x 32KiB = 160000 KiB should be > > allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example > > saying "the system would allocate roughly 300 MiBytes for network > buffers." > > ?Thats roughly twice as much. Am i Missing something here?I still need > your > > help to set the accurate number for my > > - taskmanager.network.numberOfBuffers = 4096. > > > > Thanks for your response Greg.Amir- From: amir bahmanyari < > > amirto...@yahoo.com> > > To: "dev@flink.apache.org" <dev@flink.apache.org> > > Sent: Monday, September 19, 2016 10:34 AM > > Subject: Re: Performance and Latency Chart for Flink > > > > Hi Greg,I used this guideline to calculate "taskmanager.network. > numberOfBuffers":Apache > > Flink 1.2-SNAPSHOT Documentation: Configuration > > > > > > | > > | > > | > > | | | > > > > | > > > > | > > | > > | | > > Apache Flink 1.2-SNAPSHOT Documentation: Configuration > > | | > > > > | > > > > | > > > > > > > > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 > > is there in the formula.What would you set it to? Once I have that > number, > > I will set "taskmanager.memory.preallocate" to true & will give it > > another shot.Thanks Greg > > > > From: Greg Hogan <c...@greghogan.com> > > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > > Sent: Monday, September 19, 2016 8:29 AM > > Subject: Re: Performance and Latency Chart for Flink > > > > Hi Amir, > > > > You may see improved performance setting "taskmanager.memory. > preallocate: > > true" in order to use off-heap memory. > > > > Also, your number of buffers looks quite low and you may want to increase > > "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 > > MiB. > > > > As this is a only benchmark are you able to post the code to github to > > solicit feedback? > > > > Greg > > > > On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < > > amirto...@yahoo.com.invalid> wrote: > > > > > I have new findings & subsequently relative improvements.Am testing as > we > > > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I > had > > > keep state somewhere. I went with Redis. I found it to be a major > bottle > > > neck as Beam nodes constantly are going across NW to update its > > > repository.So I replaced Redis with Java Concurrenthashmaps. Must > faster. > > > Then Kafka went out of disk space and the replication manager > > > complained. So I clustered the two Kafka nodes hoping for sharing > space. > > As > > > of this second I am typing this email, its sustaining but only 1/2 of > > > the 201401969 tuples have been processed after 3.5 hours.According to > > the > > > Linear Road benchmarking expectations, if your system is working well, > > this > > > whole 201401969 tuples must be done in 3.5 hrs max.So this means there > > is > > > still room for tuning Flink nodes. I have already shared with you all > > more > > > details about my config.It run perfect yesterday with almost 1/10th of > > this > > > load. Perfect real-time send/processed streaming behavior.If thats the > > case > > > & I cannot get better performance with FlinkRunner, my nest stop is > > > SparkRunner and repeat of the whole thing for final benchmarking of the > > two > > > under Beam APIs.Which was the initial intent anyways.If you have > > > suggestions to make improvements in the above case, I am all ears & > > greatly > > > appreciate it.Cheers,Amir- > > > > > > From: "Chawla,Sumit" <sumitkcha...@gmail.com> > > > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > > > Sent: Sunday, September 18, 2016 2:07 PM > > > Subject: Re: Performance and Latency Chart for Flink > > > > > > Has anyone else run these kind of benchmarks? Would love to hear more > > > people'e experience and details about those benchmarks. > > > > > > Regards > > > Sumit Chawla > > > > > > > > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> > > > wrote: > > > > > > > Hi Amir > > > > > > > > Would it be possible for you to share the numbers? Also share if > > possible > > > > your configuration details. > > > > > > > > Regards > > > > Sumit Chawla > > > > > > > > > > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < > > > > amirto...@yahoo.com.invalid> wrote: > > > > > > > >> Hi Fabian,FYI. This is report on other engines we did the same type > of > > > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks > > for > > > >> your help. > > > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- > > > >> linear-road-benchmark > > > >> https://github.com/IBMStreams/benchmarks > > > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro > > > >> ad-benchmark-in-apex/ > > > >> > > > >> > > > >> From: Fabian Hueske <fhue...@gmail.com> > > > >> To: "dev@flink.apache.org" <dev@flink.apache.org> > > > >> Sent: Friday, September 16, 2016 12:31 AM > > > >> Subject: Re: Performance and Latency Chart for Flink > > > >> > > > >> Hi, > > > >> > > > >> I am not aware of periodic performance runs for the Flink releases. > > > >> I know a few benchmarks which have been published at different > points > > in > > > >> time like [1], [2], and [3] (you'll probably find more). > > > >> > > > >> In general, fair benchmarks that compare different systems (if there > > is > > > >> such thing) are very difficult and the results often depend on the > use > > > >> case. > > > >> IMO the best option is to run your own benchmarks, if you have a > > > concrete > > > >> use case. > > > >> > > > >> Best, Fabian > > > >> > > > >> [1] 08/2015: > > > >> http://data-artisans.com/high-throughput-low-latency-and-exa > > > >> ctly-once-stream-processing-with-apache-flink/ > > > >> [2] 12/2015: > > > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking- > > > >> streaming-computation-engines-at > > > >> [3] 02/2016: > > > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ > > > >> > > > >> > > > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: > > > >> > > > >> > Hi > > > >> > > > > >> > Is there any performance run that is done for each Flink release? > Or > > > you > > > >> > are aware of any third party evaluation of performance metrics for > > > >> Flink? > > > >> > I am interested in seeing how performance has improved over > release > > to > > > >> > release, and performance vs other competitors. > > > >> > > > > >> > Regards > > > >> > Sumit Chawla > > > >> > > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >