Answer inline: > > I just wanted to understand say in single poll request if it fetches n > records does the above values indicate time computed for all n records or > just a single record. >
In 0.10.2, the process latency is that of a single record, not the sum of n records. The commit latency is the latency for several requests. So your second statement is true: > or is it the total average time to process these records = n * process > latency + commit latency before making another poll request. Correct. Thanks Eno > > Basically we just want to know how often is poll getting called just to see > how close is it to MAX_POLL_INTERVAL_MS_CONFIG. > > Thanks > Sachin > > > On Sun, Mar 5, 2017 at 11:42 AM, Guozhang Wang <wangg...@gmail.com> wrote: > >> That is right, since client-id is used as the metrics name which should be >> distinguishable. >> >> https://kafka.apache.org/documentation/#streamsconfigs (I think we can >> improve on the explanation of the client.id config) >> >> A common client-id could contain the machine's host-port; of course, if you >> have more than one Streams instances running on the same machine that wont >> work and you need to consider using more information. >> >> Again the client-id config is not required, and when not specified Streams >> will use an UUID suffix to achieve uniqueness but as you observed it is >> less human readable for monitoring. >> >> >> Guozhang >> >> On Fri, Mar 3, 2017 at 5:18 PM, Sachin Mittal <sjmit...@gmail.com> wrote: >> >>> Son if I am running my stream and across a cluster of different machine >>> each machine should have a different client id. >>> >>> On 4 Mar 2017 12:36 a.m., "Guozhang Wang" <wangg...@gmail.com> wrote: >>> >>>> Sachin, >>>> >>>> The reason that you got metrics name as >>>> >>>> new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1 >>>> >>>> >>>> Is that you did not set the "CLIENT_ID_CONFIG" in your app, and >>>> KafkaStreams have to use a default combo of "appID: >>>> new-part-advice"-"processID: a UUID to guarantee uniqueness across >>>> machines" as its clientId. >>>> >>>> >>>> As for metricsName, it is always set as "clientId + "-" + threadName" >>> where >>>> "StreamThread-1" is your threadName which is unique WITHIN the JVM and >>> that >>>> is why we still need the globally unique clientId for distinguishment. >>>> >>>> I just checked the source code and this logic was not changed from >> 0.10.1 >>>> to 0.10.2, so I guess you set your clientId as "new-advice-1" as well >> in >>>> 0.10.1? >>>> >>>> >>>> Guozhang >>>> >>>> >>>> >>>> On Fri, Mar 3, 2017 at 4:02 AM, Eno Thereska <eno.there...@gmail.com> >>>> wrote: >>>> >>>>> Hi Sachin, >>>>> >>>>> Now that the confluent platform 3.2 is out, we also have some more >>>>> documentation on this here: http://docs.confluent.io/3.2. >>>>> 0/streams/monitoring.html <http://docs.confluent.io/3.2. >>>>> 0/streams/monitoring.html>. We added a note on how to add other >>> metrics. >>>>> >>>>> Yeah, your calculation on poll time makes sense. The important >> metrics >>>> are >>>>> the “info” ones that are on by default. However, for stageful >>>> applications, >>>>> if you suspect that state stores might be bottlenecking, you might >> want >>>> to >>>>> collect those metrics too. >>>>> >>>>> On the benchmarks, the one called “processstreamwithstatestore” and >>>>> “count” are the closest to a benchmarking on RocksDb with the default >>>>> configs. The first writes each record to RocksDb, while the second >>>> performs >>>>> simple aggregates (reads and writes from/to RocksDb). >>>>> >>>>> We might need to add more benchmarks here, would be great to get some >>>>> ideas and help from the community. E.g., a pure RocksDb benchmark >> that >>>>> doesn’t go through streams at all. >>>>> >>>>> Could you open a JIRA on the name issue please? As an “improvement”. >>>>> >>>>> Thanks >>>>> Eno >>>>> >>>>> >>>>> >>>>>> On Mar 2, 2017, at 6:00 PM, Sachin Mittal <sjmit...@gmail.com> >>> wrote: >>>>>> >>>>>> Hi, >>>>>> I had checked the monitoring docs, but could not figure out which >>>> metrics >>>>>> are important ones. >>>>>> >>>>>> Also mainly I am looking at the average time spent between 2 >>> successive >>>>>> poll requests. >>>>>> Can I say that average time between 2 poll requests is sum of >>>>>> >>>>>> commit + poll + process + punctuate (latency-avg). >>>>>> >>>>>> >>>>>> Also I checked the benchmark tests results but could not find any >>>>>> information on rocksdb metrics for fetch and put operations. >>>>>> Is there any benchmark for these or based on my values in previous >>> mail >>>>> can >>>>>> something be commented on its performance. >>>>>> >>>>>> >>>>>> Lastly can we get some help on names like >>>> new-part-advice-d1094e71-0f59- >>>>>> 45e8-98f4-477f9444aa91-StreamThread-1 and have more standard name >> of >>>>> thread >>>>>> like new-advice-1-StreamThread-1(as in version 10.1.1) so we can >> log >>>>> these >>>>>> metrics as part of out cron jobs. >>>>>> >>>>>> Thanks >>>>>> Sachin >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Mar 2, 2017 at 9:31 PM, Eno Thereska < >> eno.there...@gmail.com >>>> >>>>> wrote: >>>>>> >>>>>>> Hi Sachin, >>>>>>> >>>>>>> The new streams metrics are now documented at >>>> https://kafka.apache.org/ >>>>>>> documentation/#kafka_streams_monitoring < >> https://kafka.apache.org/ >>>>>>> documentation/#kafka_streams_monitoring>. Note that not all of >> them >>>> are >>>>>>> turned on by default. >>>>>>> >>>>>>> We have several benchmarks that run nightly to monitor streams >>>>>>> performance. They all stem from the SimpleBenchmark.java >> benchmark. >>> In >>>>>>> addition, their results are published nightly here >>>>>>> http://testing.confluent.io <http://testing.confluent.io/>, >> (e.g., >>>>> under >>>>>>> the trunk results). E.g., looking at today's results: >>>>>>> http://confluent-kafka-system-test-results.s3-us-west-2. >>>>>>> amazonaws.com/2017-03-02--001.1488449554--apache--trunk-- >>>>>>> ef92bb4/report.html <http://confluent-kafka- >>>> system-test-results.s3-us- >>>>>>> west-2.amazonaws.com/2017-03-02--001.1488449554--apache-- >>>>>>> trunk--ef92bb4/report.html> >>>>>>> (if you search for "benchmarks.streams") you'll see results from a >>>>> series >>>>>>> of benchmarks, ranging from simply consuming, to simple topologies >>>> with >>>>> a >>>>>>> source and sink, to joins and count aggregate. These run on AWS >>>> nightly, >>>>>>> but you can also run manually on your setup. >>>>>>> >>>>>>> In addition, programmatically the code can check the >>>>> KafkaStreams.state() >>>>>>> and register listeners for when the state changes. For example, >> the >>>>> state >>>>>>> can change from "running" to "rebalancing". >>>>>>> >>>>>>> It is likely we'll need more metrics moving forward and would be >>> great >>>>> to >>>>>>> get feedback from the community. >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Eno >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 2 Mar 2017, at 11:54, Sachin Mittal <sjmit...@gmail.com> >> wrote: >>>>>>>> >>>>>>>> Hello All, >>>>>>>> I had few questions regarding monitoring of kafka streams >>> application >>>>> and >>>>>>>> what are some important metrics we should collect in our case. >>>>>>>> >>>>>>>> Just a brief overview, we have a single thread application >>> (0.10.1.1) >>>>>>>> reading from single partition topic and it is working all fine. >>>>>>>> Then we have same application (using 0.10.2.0) multi threaded >> with >>> 4 >>>>>>>> threads per machine and 3 machines cluster setup reading for same >>> but >>>>>>>> partitioned topic (12 partitions). >>>>>>>> Thus we have each thread processing single partition same case as >>>>> earlier >>>>>>>> one. >>>>>>>> >>>>>>>> The new setup also works fine in steady state, but under load >>> somehow >>>>> it >>>>>>>> triggers frequent re-balance and then we run into all sort of >>> issues >>>>> like >>>>>>>> stream thread dying due to CommitFailedException or entering into >>>>>>> deadlock >>>>>>>> state. >>>>>>>> After a while we restart all the instances then it works fine >> for a >>>>> while >>>>>>>> and again we get the same problem and it goes on. >>>>>>>> >>>>>>>> 1. So just to monitor, like when first thread fails what would be >>>> some >>>>>>>> important metrics we should be collecting to get some sense of >>> whats >>>>>>> going >>>>>>>> on? >>>>>>>> >>>>>>>> 2. Is there any metric that tells time elapsed between successive >>>> poll >>>>>>>> requests, so we can monitor that? >>>>>>>> >>>>>>>> Also I did monitor rocksdb put and fetch times for these 2 >>> instances >>>>> and >>>>>>>> here is the output I get: >>>>>>>> 0.10.1.1 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>>> rocksdb-window-metrics,client- >>>>>>> id=new-advice-1-StreamThread-1 >>>>>>>> key-table-put-avg-latency-ms >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-advice-1-StreamThread-1: >>>>>>>> 206431.7497615029 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>>> rocksdb-window-metrics,client- >>>>>>> id=new-advice-1-StreamThread-1 >>>>>>>> key-table-fetch-avg-latency-ms >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-advice-1-StreamThread-1: >>>>>>>> 2595394.2746129474 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>>> rocksdb-window-metrics,client- >>>>>>> id=new-advice-1-StreamThread-1 >>>>>>>> key-table-put-qps >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-advice-1-StreamThread-1: >>>>>>>> 232.86299499317252 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>>> rocksdb-window-metrics,client- >>>>>>> id=new-advice-1-StreamThread-1 >>>>>>>> key-table-fetch-qps >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-advice-1-StreamThread-1: >>>>>>>> 373.61071016166284 >>>>>>>> >>>>>>>> Same values for 0.10.2.0 I get >>>>>>>> $>get -s -b kafka.streams:type=stream- >>> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>> StreamThread-1 >>>>>>>> key-table-put-latency-avg >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>>> StreamThread-1: >>>>>>>> 1199859.5535022356 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>> StreamThread-1 >>>>>>>> key-table-fetch-latency-avg >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>>> StreamThread-1: >>>>>>>> 3679340.80748852 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>> StreamThread-1 >>>>>>>> key-table-put-rate >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>>> StreamThread-1: >>>>>>>> 56.134778706069184 >>>>>>>> $>get -s -b kafka.streams:type=stream- >>> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>> StreamThread-1 >>>>>>>> key-table-fetch-rate >>>>>>>> #mbean = kafka.streams:type=stream- >> rocksdb-window-metrics,client- >>>>>>>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- >>>>> StreamThread-1: >>>>>>>> 136.10721427931827 >>>>>>>> >>>>>>>> I notice that result in 10.2.0 is much worse than same for 10.1.1 >>>>>>>> >>>>>>>> I would like to know >>>>>>>> 1. Is there any benchmark on rocksdb as at what rate/latency it >>>> should >>>>> be >>>>>>>> doing put/fetch operations. >>>>>>>> >>>>>>>> 2. What could be the cause of inferior numbers in 10.2.0, is it >>>> because >>>>>>>> this application is also running three other threads doing the >> same >>>>>>> thing. >>>>>>>> >>>>>>>> 3. Also whats with the name new-part-advice-d1094e71- >>>>>>>> 0f59-45e8-98f4-477f9444aa91-StreamThread-1 >>>>>>>> I wanted to put this as a part of my cronjob, so why can't we >>> have >>>>>>>> simpler name like we have in 10.1.1, so it is easy to write the >>>> script. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Sachin >>>>>>> >>>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>> >> >> >> >> -- >> -- Guozhang >>