> On 16 Jul 2019, at 20:48, Dan Stromberg <drsali...@gmail.com> wrote:
> 
> 
> 
> On Tue, Jul 16, 2019 at 11:13 AM Barry Scott <ba...@barrys-emacs.org 
> <mailto:ba...@barrys-emacs.org>> wrote:
> I'm going to assume you are on linux.
> Yes, I am.  Ubuntu 16.04.6 LTS sometimes, Mint 19.1 other times.
> 
> On 16 Jul 2019, at 18:35, Dan Stromberg <drsali...@gmail.com 
> <mailto:drsali...@gmail.com>> wrote:
> > 
> > I'm looking at a performance problem in a large CPython 2.x/3.x codebase
> > with quite a few dependencies.
> > 
> > I'm not sure what's causing the slowness yet.  The CPU isn't getting hit
> > hard, and I/O on the system appears to be low - but throughput is poor.
> > I'm wondering if it could be CPU-bound Python threads causing the problem
> > (because of the threading+GIL thing).
> 
> Does top show the process using 100% CPU?
> Nope.  CPU utilization and disk use are both low.

Then your problem is latency. You need to find the slow operation.

> We've been going into top, and then hitting '1' to see things broken down by 
> CPU core (there are 32 of them, probably counting hyperthreads as different 
> cores), but the CPU use is in the teens or so.
> 
> I've also tried dstat and csysdig.  The hardware isn't breaking a sweat, but 
> throughput is poor.

> > The non-dependency Python portions don't Appear to have much in the way of
> > threading going on based on a quick grep, but csysdig says a process
> > running the code has around 32 threads running - the actual thread count
> > varies, but that's the ballpark.
> > 
> > I'm wondering if there's a good way to find two counts of those threads -
> > how many are from CPython code that could run afoul of the GIL, and how
> > many of them are from C/C++ extension modules that wouldn't be responsible
> > for a GIL issue.
> 
> >From the docs on threading:
> 
> threading.active_count()
>  
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.active_count>
> Return the number of Thread 
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.Thread>
>  objects currently alive. The returned count is equal to the length of the 
> list returned by enumerate() 
> <file:///Library/Frameworks/Python.framework/Versions/3.7/Resources/English.lproj/Documentation/library/threading.html?highlight=threading#threading.enumerate>.
> 
> Are you on a Mac?

Opss a file: link sorry should have search the online docs.

I use many operating systems: Fedora, macOS, Windows, NetBSD, CentOS and others 
in the past.

> 
> https://docs.python.org/2/library/threading.html 
> <https://docs.python.org/2/library/threading.html> appears to have some good 
> info. I'll probably try logging threading.active_count()
>  
> A question arises though: Does threading.active_count() only show Python 
> threads created with the threading module?  What about threads created with 
> the thread module?

Only pythons threads, if you think about it why would python care about threads 
it does not control?


> 
> Try running strace on the process to see what system calls its making.
> I've tried it, but thank you.  It's a good suggestion.
> 
> I often find that when strace'ing a program, there's a bunch of 
> mostly-irrelevant stuff at Initial Program Load (IPL), but then the main loop 
> fails into a small cycle of system calls.

And what are thoses sys calls and what is the timing of them?
If you are use select/poll how long before the call returns.
If you in a read how long before it returns.

> 
> Not with this program.  Its main loop is busy and large.

Does the code log any metrics or telemetry to help you?
I work on a product that produces time-series data to show key information 
about the service.
TPS, cache hit rates etc.

Should have mention before you can run the code under python's cprofile.

Do a test run against the process and then run analysis on the data that 
cprofile
produces to find out elapse times and cpu times of the code.

> 
> You could also connect gdb to the process and find out what code the threads 
> are running.
> 
> I used to use gdb, and wrappers for gdb, when I was doing C code, but I don't 
> have much experience using it on a CPython interrpreter.
> 
> Would I be doing a "thread apply all bt" or what?  I'm guessing those 
> backtraces could facilitate identifying the origin of a thread.

Yes  thread apply all bt works great on a python process. recent gdb releases 
knows how to format the stack and show you the python stack,
forgot the command, but its easy to google for.

Barry


> 
> Thanks a bunch.
> 

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to