Re: Relayd 'check script' performance issue?

Gregory Edigarov Mon, 30 Jul 2012 08:33:10 -0700

On 07/30/2012 06:02 PM, Bennett Samowich wrote:

The problem exists even if I use the system's "/usr/bin/false" and
"/usr/bin/true" commands.
The problem exists even when PF is disabled or the only rule is "pass in".


That being said the script itself is a simple host lookup against the
IP addresses to ensure the DNS server is actually resolving.   Again,
just using "/usr/bin/false" or "/usr/bin/true" produces the same drop
in throughput.

An example of the drop looks like this:
9913Mbps
9913Mbps
7253Mbps <--- script interval point
9913Mbps
9913Mbps
...etc...

When the script is an actual shell script rather than /usr/bin/false,
the throughput drops spans the three seconds surrounding the time the
script runs.
9913Mbps
9913Mbps
4321Mbps
7253Mbps <--- script interval point
5162Mbps
9913Mbps
9913Mbps

# relayd.conf (somewhat sterilized):
table <dns-servers> { 192.168.1.1, 192.168.1.2 }
redirect dns-udp {
   listen on 192.168.100.1 udp port 53
   forward to <dns-servers> port 53     \
   check script "/usr/bin/false"              \
   timeout 4000                                    \
   interval 15                                        \
   mode roundrobin
}
redirect dns-tcp {
   listen on 192.168.100.1 port 53
   forward to <dns-servers> port 53     \
   check script "/usr/bin/false"              \
   timeout 4000                                    \
   interval 15                                        \
   mode roundrobin
}

looks reasonable to me.
what 'top' shows during the problems?
what 'top' shows when there is no problems?
what 'vmstat -i' shows?
what's in your dmesg?

I guess you have an SSD or something like this, may be a flash drive?

I.e. I suspect have something really slow for your disk device, whichperhaps gives you a high interrupt rate.

On Mon, Jul 30, 2012 at 8:59 AM, Gregory Edigarov <ediga...@cupid.com> wrote:

On 07/30/2012 03:25 PM, Bennett Samowich wrote:

I've uncovered a troubling performance symptom that I believe is
related to relayd's "check script" functionality.

The system is a Dell R710 with 12GB RAM and 10Gb interfaces.  The
problem is that when relayd is running with redirects that uses the
check script functionality, performance of the interface drops around
30% while the check script is running.

I ran the tests in an offline configuration so no other traffic could
be a factor ( test1 <--> OpenBSD <--> test2 ).  Tests were performed
using the nuttcp tool and both servers ( test1 & test2 ) pull
line-rate 9.912Gbps when connected back-to-back.  When run through the
OpenBSD firewall, regardless of PF rules, the rate drops to 7.25Gbps
when the script runs.

At first I thought it was my script but I replaced my script with
'true', 'false' and the problem still remained.  I've validated that
this exists in versions 4.8 through 5.1.   I've also tried looking at
the relayd code but it seemed like a reasonable exec call.  I can't
seem to understand why a running script would cause a network
performance drop.  I would also bet that this only noticeable over
10Gb interfaces.  Nevertheless, with check script running every 15
seconds we've succumbed to an overall drop in network performance.

Sorry, you do not give a full information. What's in your script? what's in
your relayd.conf?
what are your pf rules? dmesg is also welcome.

Any insight or direction would be greatly appreciated.

Bennett

--

Re: Relayd 'check script' performance issue?

Reply via email to