Answering to myself, just for the records: 1. The culprit was me. As lazy as I am, I have used f-strings all over the place in calls to `logging.logger.debug()` and friends, evaluating all arguments regardless of whether the logger was enabled or not. Replacing these f-strings by regular printf-like format strings solved the issue. Now the application bowls happily along, consistently below 0.02 seconds per second application time. 2. Valgrind + callgrind is an awesome toolchain to spot performance issues, even on VMs.
Am Di., 4. Okt. 2022 um 11:05 Uhr schrieb Andreas Ames < andreas.0815.qwe...@gmail.com>: > Hi all, > > I am wrapping an embedded application (, which does not use any dynamic > memory management,) using Cython to call it from CPython. The wrapped > application uses a cyclic executive, i.e. everything is done in the > input-logic-output design, typical for some real-time related domains. > Consequentially, every application cycle executes more or less the very > same code. As I am still in a prototyping stadium, the wrapped process is > completely CPU-bound, i.e. except of some logging output there is no I/O > whatsoever. > > During one second of "application time", I am doing exactly 120 calls into > the application through three Cython-wrapped API functions. The > application uses some platform-dependent APIs, which I have also wrapped > with Cython, so that there are numerous callbacks into the Python realm per > call into the application. What I am observing now, is that the performance > per "application second" decreases (remember: executing code that does the > same thing on every cycle) and extending the number of loop iterations does > not seem to cause any bound to this decrease. In the log ouput below, you > can see the GC counts, which look innocent to me. The "real time" is > measured using "time.time()". The "top" utility does not suggest any memory > leak either. I am developing on WSL2, but I have checked that this > performance effect also happens on physical machines. Right now, I am > staring at "kcachegrind", but I have no idea, how to determine time series > for the performance of functions (I am not looking for those functions, > which need the most time, but for those, which consume more and more > execution time). > > One more thing to look for could be memory fragmentation, but before that > I would like to ask the experts here for their ideas and experiences and/or > for tools, which could help to find the culprit. > > 2022-10-04 10:40:50|INFO |__main__ |Execution loop 0 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.06862711906433105 seconds real time. >> 2022-10-04 10:40:51|INFO |__main__ |Execution loop 100 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.08224177360534668 seconds real time. >> 2022-10-04 10:40:52|INFO |__main__ |Execution loop 200 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.08225250244140625 seconds real time. >> 2022-10-04 10:40:53|INFO |__main__ |Execution loop 300 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.10176873207092285 seconds real time. >> 2022-10-04 10:40:54|INFO |__main__ |Execution loop 400 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.10900592803955078 seconds real time. >> 2022-10-04 10:40:55|INFO |__main__ |Execution loop 500 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.12233948707580566 seconds real time. >> 2022-10-04 10:40:56|INFO |__main__ |Execution loop 600 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.14058256149291992 seconds real time. >> 2022-10-04 10:40:58|INFO |__main__ |Execution loop 700 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.14777183532714844 seconds real time. >> 2022-10-04 10:40:59|INFO |__main__ |Execution loop 800 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.15729451179504395 seconds real time. >> 2022-10-04 10:41:01|INFO |__main__ |Execution loop 900 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.17365813255310059 seconds real time. >> 2022-10-04 10:41:03|INFO |__main__ |Execution loop 1000 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.17772984504699707 seconds real time. >> 2022-10-04 10:41:05|INFO |__main__ |Execution loop 1100 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.1955263614654541 seconds real time. >> 2022-10-04 10:41:07|INFO |__main__ |Execution loop 1200 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.20046710968017578 seconds real time. >> 2022-10-04 10:41:09|INFO |__main__ |Execution loop 1300 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.22513842582702637 seconds real time. >> 2022-10-04 10:41:11|INFO |__main__ |Execution loop 1400 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.23578548431396484 seconds real time. >> 2022-10-04 10:41:13|INFO |__main__ |Execution loop 1500 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.24581527709960938 seconds real time. >> 2022-10-04 10:41:16|INFO |__main__ |Execution loop 1600 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.2541334629058838 seconds real time. >> 2022-10-04 10:41:19|INFO |__main__ |Execution loop 1700 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.26812195777893066 seconds real time. >> 2022-10-04 10:41:21|INFO |__main__ |Execution loop 1800 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.28777456283569336 seconds real time. >> 2022-10-04 10:41:24|INFO |__main__ |Execution loop 1900 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.3005337715148926 seconds real time. >> 2022-10-04 10:41:28|INFO |__main__ |Execution loop 2000 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.30992960929870605 seconds real time. >> 2022-10-04 10:41:31|INFO |__main__ |Execution loop 2100 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.32058119773864746 seconds real time. >> 2022-10-04 10:41:34|INFO |__main__ |Execution loop 2200 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.33020949363708496 seconds real time. >> 2022-10-04 10:41:37|INFO |__main__ |Execution loop 2300 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.34426307678222656 seconds real time. >> 2022-10-04 10:41:41|INFO |__main__ |Execution loop 2400 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.35851263999938965 seconds real time. >> 2022-10-04 10:41:45|INFO |__main__ |Execution loop 2500 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.3664553165435791 seconds real time. >> 2022-10-04 10:41:48|INFO |__main__ |Execution loop 2600 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.3767662048339844 seconds real time. >> 2022-10-04 10:41:52|INFO |__main__ |Execution loop 2700 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.39760732650756836 seconds real time. >> 2022-10-04 10:41:56|INFO |__main__ |Execution loop 2800 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.42221736907958984 seconds real time. >> 2022-10-04 10:42:01|INFO |__main__ |Execution loop 2900 done. GC >> counts = (381, 9, 3); 1 second of application time corresponds to >> 0.4237234592437744 seconds real time. > > > Thanks in advance, > > Andreas > > -- https://mail.python.org/mailman/listinfo/python-list