*sigh*. Oh, how I wish that various e-mail clients would quit re-formatting stuff. My previous response here was so nice & neat & tidy before I hit 'Send'. Reading that response back via IBMMain makes me look like a complete illiterate...
Sean On Thu, 27 Aug 2020 at 09:01, Sean Gleann <sean.gle...@gmail.com> wrote: > Hi Attila - thanks for the pointers, but I'm not sure of how to act upon > them. > > The start-up for Cadvisor that I'm using doesn't feature any pointer to a > parameter list, and despite much googling I don't see any mention of such a > thing. Everything keeps referring back to Prometheus and then on to Grafana > My Cadvisor start-up (taken directly from the IBM Red Book and slightly > modified to comply with local restrictions): > docker network create monitoring > docker run --name cadvisor -v /sys:/sys:ro -v > /var/lib/docker/:/var/lib/docker:ro -v /dev/disk:/dev/disk:ro -d --network > monitoring ibmcom/cadvisor-s390x:0.33.0 > > Perhaps I'm looking at things the wrong way, but my current understanding > is: > Cadvisor (and also Nodeexporter) collect various usage stats; > Prometheus then gathers that data and does some sort of pre-processing of > it (it doesn't tell Cadvisor to 'do something' - it just passively makes > use of the data that Cadvisor collects) > Grafana takes the data from Prometheus and uses it to generate various > graphs/tables/reports. > > My situation is that when I run Cadvisor on it's own - no other containers > at all - then it floods as many processors as I define in the zcx > start.json file. > > Whilst Cadvisor is running, I can go to the relevant web-page and I can > see that it is producing meters/charts, etc all on its own. Since that is > the case, what is the point of Grafana? > > I have a Prometheus.yml file that features the term 'scrape_interval' (but > not 'housekeeping'), but that file is for use by Prometheus, isn't it? How > does it affect the amount of work that Cadvisor is doing, since I haven't > even started that container yet? > > Regards > Sean > > On Wed, 26 Aug 2020 at 23:05, Attila Fogarasi <fogar...@gmail.com> wrote: > >> Check your values for housekeeping interval and scrape_interval. >> Recommended is 15s and 30s (which makes for a 60 second rate window). >> Small value for housekeeping interval will cause cAdvisor cpu usage to be >> high, while scrape_interval affects Prometheus cpu usage. It is entirely >> possible to cause data collection to use 100% of the z/OS cpu -- remember >> that on Unix systems the rule of thumb is 40% overhead for uncaptured cpu >> time while z/OS is far more efficient and runs well under 10%. You will >> see this behaviour in zCX containers, it isn't going to measure the same >> as >> z/OS workload. The optimizations in Unix have the premise that cpu time >> is >> low cost (as is memory), while z/OS considers cpu to be high cost and path >> length worth saving. Same for the subsystems in z/OS and performance >> monitors. >> >> On Wed, Aug 26, 2020 at 11:43 PM Sean Gleann <sean.gle...@gmail.com> >> wrote: >> >> > Allan - "...count the beans differently...' Yes, I'm beginning to get >> used >> > to that concept. For instance, with the CPU Utilisation data that I >> *have* >> > been able to retrieve, the metric given is not 'CPU%', but 'Number of >> > cores'. I'm having to do some rapid re-orienting to my way of thinking. >> > As for the memory size, I've got "mem-gb" : 2 defined in my start.json >> > file, but I've not seen any indication of paging load at all in my >> testing. >> > >> > Michael - 5 zIIPs? I wish! Nope - these are all general-purpose >> > processors. >> > The z/OS system I'm using is a z/VM guest on a system run by an external >> > supplier, so I'm not sure if defining zIIPs would actually achieve >> anything >> > (Is it possible to dedicate a zIIP engine to a specific z/VM guest? >> That's >> > a road I've not yet gone down). >> > With regard to the WLM definitions, I followed the advice in the red >> book >> > and I'm reasonably certain I've got it right. Having said that, >> cross-refer >> > to a thread that I started earlier this week, titled "WLM Query" >> > The response to that led to me defining a resource group to cap the >> > started task to 10MSU, which resulted in a CPU% Util value of roughly >> 5% - >> > something I could be happy with. >> > Under that cap, the started task ran, yes, but it ran like a >> three-legged >> > dog (my apologies to limb-count-challenged canines). >> > Start-up of the task, from the START command to the "server is >> > listening..." message took over an hour, and >> > STOP-command-to-task-termination took approx. 30 minutes. >> > (SSH-ing to the task was a bit of a joke, too. Responses to simple >> commands >> > like 'docker ps -a' could be seen 'painting' across the screen, >> > character-by-character...) >> > As a result, I've moved away from trying to limit the task for the time >> > being. I'm concentrating on attempting to get cadvisor to be a bit less >> > greedy. >> > >> > Regards >> > Sean >> > >> > On Wed, 26 Aug 2020 at 13:49, Michael Babcock <bigironp...@gmail.com> >> > wrote: >> > >> > > I can’t check my zCX out right now since my internet is down. >> > > >> > > You are running these on zIIP engines correct? Must be nice to have 5 >> > > zIIPs! And have the WLM parts in place? Although it probably >> wouldn’t >> > > make much difference during startup/shutdown. >> > > >> > > On Wed, Aug 26, 2020 at 3:40 AM Sean Gleann <sean.gle...@gmail.com> >> > wrote: >> > > >> > > > Can anyone offer advice, please, with regard to monitoring the >> system >> > > > >> > > > resource consumption of a zcx Container task? >> > > > >> > > > >> > > > >> > > > I've got a zcx Container task running on a 'sandbox' system where - >> as >> > > yet >> > > > >> > > > - I'm not collecting any RMF/SMF data. Because of that, my only >> source >> > of >> > > > >> > > > system usage is the SDSF DA panel. I feel that the numbers I see >> there >> > > > >> > > > are... 'questionable' is the best word I can think of. >> > > > >> > > > >> > > > >> > > > Firstly, the EXCP-count for the task goes up to about 15360 during >> the >> > > > >> > > > initial start-up phase, but then it stays there until the STOP >> command >> > is >> > > > >> > > > issued. At that point, EXCP-count starts rising again, until the >> task >> > > > >> > > > finally terminates. The explanation for that is probably because all >> > the >> > > > >> > > > I/O is being handled internally at the 'Linux' level - the task >> must be >> > > > >> > > > doing *some* I/O, right? - but the data isn't getting back to SDSF >> for >> > > some >> > > > >> > > > reason. Without the benefit of SMF data to examine, I'm wondering if >> > this >> > > > >> > > > is part of a larger problem. >> > > > >> > > > >> > > > >> > > > The other thing that troubles me is the CPU% busy value. My sandbox >> > > system >> > > > >> > > > has 5 engines defined, and in the 'start.json' file that controls >> the >> > zcx >> > > > >> > > > Container task, I've specified a 'cpu' value of 4. During the >> start-up >> > > > >> > > > phase for the Container started task, SDSF shows CPU% values of >> approx >> > > 80%, >> > > > >> > > > but when the task is finally initialised, this drops to 'tickover' >> > rates >> > > of >> > > > >> > > > about 1%. I'm happy with that - the initial start-up of *any* task >> as >> > > > >> > > > complex as a zcx Container is likely to cause high CPU usage, and >> the >> > > > >> > > > subsequent drop to the 1% levels is fine by me. >> > > > >> > > > >> > > > >> > > > But... Once the Container task is started and I've ssh'd into it, I >> > then >> > > > >> > > > want to monitor its 'internal' system consumption. I've been using >> the >> > > > >> > > > 'Getting Started...' redbook as my guide throughout all this >> project, >> > and >> > > > >> > > > it talks about using "Nodeexporter", "Cadvisor", "Prometheus" and >> > > "Grafana" >> > > > >> > > > as tools for this. I've got all those things installed and I can >> start >> > > and >> > > > >> > > > stop them quite happily, but I've found that using Cadvisor on it's >> own >> > > can >> > > > >> > > > drive CPU% levels back up to 80% for the entire time it is running. >> If >> > a >> > > > >> > > > system is running flat-out when all it is doing is monitoring >> itself, >> > > well, >> > > > >> > > > there's something wrong somewhere... I'm trying to find an idiot's >> > guide >> > > to >> > > > >> > > > controlling what Cadvisor does, but as yet I've been unsuccessful. >> > > > >> > > > >> > > > >> > > > Regards >> > > > >> > > > Sean >> > > > >> > > > >> > > > >> > > > >> ---------------------------------------------------------------------- >> > > > >> > > > For IBM-MAIN subscribe / signoff / archive access instructions, >> > > > >> > > > send email to lists...@listserv.ua.edu with the message: INFO >> IBM-MAIN >> > > > >> > > > -- >> > > Michael Babcock >> > > OneMain Financial >> > > z/OS Systems Programmer, Lead >> > > >> > > ---------------------------------------------------------------------- >> > > For IBM-MAIN subscribe / signoff / archive access instructions, >> > > send email to lists...@listserv.ua.edu with the message: INFO >> IBM-MAIN >> > > >> > >> > ---------------------------------------------------------------------- >> > For IBM-MAIN subscribe / signoff / archive access instructions, >> > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN >> > >> >> ---------------------------------------------------------------------- >> For IBM-MAIN subscribe / signoff / archive access instructions, >> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN >> > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN