>> One with 50 threads; it is remote from the cluster but within the same >> DC in both cases. I also run the test with multiple clients and saw >> similar results when summing the reqs/sec. > > Multiple client processes, or multiple client machines?
In particular, note that the way CPython works, if you're CPU bound across many threads, you're constantly hitting the worst possible scenario with respect to wasting CPU cycles on multiple cores (due to the extremely contended GIL). While I'd still expect to see an increase in throughput from running multiple separate processes on the same (multi-core) machine, I really wouldn't be too sure. Even with supposedly idle CPU you may still be bottlenecking on the client depending on scheduling decisions in the kernel. -- / Peter Schuller