Hi glyph,

>>I get strange results.
>>Sluggish performance:

>Did you ever diagnose this further?  This seems like the sort of thing that we 
>should start having a performance test for.

Not yet. I didn't reply again since you gave me enough homework already:

- Run the producer/consumer variant on Linux (bisecting BSD/kqueue)
- Do the memory profiling with non-producer/consumer (tracking down _where_ 
memory runs away)

Other stuff interrupted me again, and my impression is, that it might be 
significant effort to really track this down. No surprise here: really pushing 
things often means "issues" pop up.

I absolutely agree: we should have repeatable, comparable, standard performance 
tests.

Like we have with trial/buildbot, but for performance, not functional tests.

FWIW, here are my thoughts on this: 


1)
A simple Twisted based "TCP echo server" (maybe in non-producer/consumer and 
producer/consumer variants) as a testee will already allow us to do a _lot_.
We can come up with more testees later (e.g. Twisted Web with static resource, 
...).

2)
It might be wise to use a non-Twisted, standard test load generator like 
netperf, instead of a Twisted based one.
- having the load generator written in Twisted creates a cyclic dependency 
(e.g. rgd. interpreting results)
- it allows to compare results to non-Twisted setups and allows others to 
repeat against their stuff

3)
We should include at least 2 operating systems (FreeBSD / Linux).
This allows to quickly bisect OS or Twisted reactor specific issues.

4)
We should run this on real, physical, non-virtualized, dedicated hardware and 
networking gear.
I can't stress enough how important this is in my experience:
Any form of virtualization brings a whole own dimension of factors/variability 
into the game.
Testing in VMs on a shared hypervisor on a public cloud: you never really know, 
you never really can repeat.
Repeatability is absolutely crucial.

5)
The load generator and the testee should run on 2 separate boxes, connected via 
real network (e.g. switched ether).
Testing via loopback is often misleading, and practically often irrelevant (too 
far away from production deployments).

6)
We should test on both CPython and PyPy.
Because this is where stuff actually runs later in production. And for 
bisecting Python implementation specifics.

7)
It should be automated.

8)
The results should be stored in a long term archive (a database) so we can 
compare results over time / setups.

9)
We should collect monitoring parameters (CPU load ...) on both the load 
generator and testee boxes during test runs.
Like, "same network perf., but one triggers double the CPU load" ..

===

Because of 3/4/5, this requires 4 boxes to begin with. Those should be 
absolutely _identical_.

Currently, we (Tavendo) have a setup dedicated to performance tests consisting 
of 2 boxes with dual port 10GbE and a 8 port 10GbE switch.

Buying 2 more identical boxes and adding those would be technically possible. 
7/8/9 and setting this all up is work.

I would need to somehow justify/book these investments. I have "ideas" about 
that, but step by step: what do you think about above?

/Tobias


_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to