> On 16 Jul 2019, at 20:48, Dan Stromberg <drsali...@gmail.com> wrote:
>
>
>
> On Tue, Jul 16, 2019 at 11:13 AM Barry Scott <ba...@barrys-emacs.org
> <mailto:ba...@barrys-emacs.org>> wrote:
> I'm going to assume you are on linux.
> Yes, I am. Ubuntu 16.04.6 LTS sometimes, Mint 19.1 other times.
>
> On 16 Jul 2019, at 18:35, Dan Stromberg <drsali...@gmail.com
> <mailto:drsali...@gmail.com>> wrote:
> >
> > I'm looking at a performance problem in a large CPython 2.x/3.x codebase
> > with quite a few dependencies.
> >
> > I'm not sure what's causing the slowness yet. The CPU isn't getting hit
> > hard, and I/O on the system appears to be low - but throughput is poor.
> > I'm wondering if it could be CPU-bound Python threads causing the problem
> > (because of the threading+GIL thing).
>
> Does top show the process using 100% CPU?
> Nope. CPU utilization and disk use are both low.
Then your problem is latency. You need to find the slow operation.
> We've been going into top, and then hitting '1' to see things broken down by
> CPU core (there are 32 of them, probably counting hyperthreads as different
> cores), but the CPU use is in the teens or so.
>
> I've also tried dstat and csysdig. The hardware isn't breaking a sweat, but
> throughput is poor.
> > The non-dependency Python portions don't Appear to have much in the way of
> > threading going on based on a quick grep, but csysdig says a process
> > running the code has around 32 threads running - the actual thread count
> > varies, but that's the ballpark.
> >
> > I'm wondering if there's a good way to find two counts of those threads -
> > how many are from CPython code that could run afoul of the GIL, and how
> > many of them are from C/C++ extension modules that wouldn't be responsible
> > for a GIL issue.
>
> >From the docs on threading:
>
> threading.active_count()
>
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.active_count>
> Return the number of Thread
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.Thread>
> objects currently alive. The returned count is equal to the length of the
> list returned by enumerate()
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.enumerate>.
>
> Are you on a Mac?
Opss a file: link sorry should have search the online docs.
I use many operating systems: Fedora, macOS, Windows, NetBSD, CentOS and others
in the past.
>
> https://docs.python.org/2/library/threading.html
> <https://docs.python.org/2/library/threading.html> appears to have some good
> info. I'll probably try logging threading.active_count()
>
> A question arises though: Does threading.active_count() only show Python
> threads created with the threading module? What about threads created with
> the thread module?
Only pythons threads, if you think about it why would python care about threads
it does not control?
>
> Try running strace on the process to see what system calls its making.
> I've tried it, but thank you. It's a good suggestion.
>
> I often find that when strace'ing a program, there's a bunch of
> mostly-irrelevant stuff at Initial Program Load (IPL), but then the main loop
> fails into a small cycle of system calls.
And what are thoses sys calls and what is the timing of them?
If you are use select/poll how long before the call returns.
If you in a read how long before it returns.
>
> Not with this program. Its main loop is busy and large.
Does the code log any metrics or telemetry to help you?
I work on a product that produces time-series data to show key information
about the service.
TPS, cache hit rates etc.
Should have mention before you can run the code under python's cprofile.
Do a test run against the process and then run analysis on the data that
cprofile
produces to find out elapse times and cpu times of the code.
>
> You could also connect gdb to the process and find out what code the threads
> are running.
>
> I used to use gdb, and wrappers for gdb, when I was doing C code, but I don't
> have much experience using it on a CPython interrpreter.
>
> Would I be doing a "thread apply all bt" or what? I'm guessing those
> backtraces could facilitate identifying the origin of a thread.
Yes thread apply all bt works great on a python process. recent gdb releases
knows how to format the stack and show you the python stack,
forgot the command, but its easy to google for.
Barry
>
> Thanks a bunch.
>
--
https://mail.python.org/mailman/listinfo/python-list