On 19/08/16 11:32, Zhang Huangbin wrote:

Dear Bill,

Thanks very much for helping.

On Aug 19, 2016, at 4:17 AM, Bill Cole 
<postfixlists-070...@billmail.scconsult.com> wrote:

What do you mean "run" the policy service? It's a python program.
Which must be running in order for it to be listening for connections.
Likely mechanisms would be via a SysV init script in /etc/init.d/ or via a 
systemd service definition.
On some old Linux distributions, it's run with a SysV init script, but on 
CentOS 7 and Ubuntu 16.04, it's run via systemd.

If your policy server is listening on 127.0.0.1:1234, you could try this:

for x in {1..100} ; do nc 127.0.0.1 1234 & done

That attempts to make 100 TCP connections to 127.0.0.1:1234 with 100 different 
'nc' processes, all running in the background.

If your policy server is accepting the connections, running the "jobs" command after all 
of those background processes have launched should show them all in "Stopped(SIGTTIN)" 
state, meaning that they are connected and waiting for input.
I did this test with shell:

for i in $(seq 200); do
     nc 127.0.0.1 1234 &
done

'jobs' commands show 200 "Stopped" jobs.

If all 100 processes connect in a reasonable time, the next step would be to do 
the same test, but with input piped into all of the nc commands simulating what 
Postfix sends to a policy server.
I tested with shell commands below:

for i in $(seq 1000); do
     (cat <<EOF
request=smtpd_access_policy
protocol_state=RCPT
... [omit other attr=value here] ...
ccert_pubkey_fingerprint=

EOF
) | nc 127.0.0.1 7777 &
done

I get some "Ncat: Connection reset by peer." and "Ncat: Connection timed out." 
errors.

Does it mean that my policy server design (programming) is improper? Or, just 
slow performance?
It sounds like similar behaviour to what postfix is logging, so at least you have a way to replicate it now. Try checking netstat -antp | grep :7777 and see what state all the tcp sockets are in. If you're seeing a lot in SYN state it means that your python process has been too busy to process the information from the kernel. If you're seeing a lot in TIME_WAIT it might be that the rate of connections is too high and you're running out of 127.0.0.1:source port -> 127.0.0.1:7777 combinations. This obviously won't solve the problem but will give you an idea of what's happening.

Reply via email to