On 19/08/16 11:32, Zhang Huangbin wrote:
Dear Bill,
Thanks very much for helping.
On Aug 19, 2016, at 4:17 AM, Bill Cole
<postfixlists-070...@billmail.scconsult.com> wrote:
What do you mean "run" the policy service? It's a python program.
Which must be running in order for it to be listening for connections.
Likely mechanisms would be via a SysV init script in /etc/init.d/ or via a
systemd service definition.
On some old Linux distributions, it's run with a SysV init script, but on
CentOS 7 and Ubuntu 16.04, it's run via systemd.
If your policy server is listening on 127.0.0.1:1234, you could try this:
for x in {1..100} ; do nc 127.0.0.1 1234 & done
That attempts to make 100 TCP connections to 127.0.0.1:1234 with 100 different
'nc' processes, all running in the background.
If your policy server is accepting the connections, running the "jobs" command after all
of those background processes have launched should show them all in "Stopped(SIGTTIN)"
state, meaning that they are connected and waiting for input.
I did this test with shell:
for i in $(seq 200); do
nc 127.0.0.1 1234 &
done
'jobs' commands show 200 "Stopped" jobs.
If all 100 processes connect in a reasonable time, the next step would be to do
the same test, but with input piped into all of the nc commands simulating what
Postfix sends to a policy server.
I tested with shell commands below:
for i in $(seq 1000); do
(cat <<EOF
request=smtpd_access_policy
protocol_state=RCPT
... [omit other attr=value here] ...
ccert_pubkey_fingerprint=
EOF
) | nc 127.0.0.1 7777 &
done
I get some "Ncat: Connection reset by peer." and "Ncat: Connection timed out."
errors.
Does it mean that my policy server design (programming) is improper? Or, just
slow performance?
It sounds like similar behaviour to what postfix is logging, so at least
you have a way to replicate it now. Try checking netstat -antp | grep
:7777 and see what state all the tcp sockets are in. If you're seeing a
lot in SYN state it means that your python process has been too busy to
process the information from the kernel. If you're seeing a lot in
TIME_WAIT it might be that the rate of connections is too high and
you're running out of 127.0.0.1:source port -> 127.0.0.1:7777
combinations. This obviously won't solve the problem but will give you
an idea of what's happening.