I hope this is the right place to post this.
Background
-----------------
I have a debian box out in a datacenter that (amongst other things) is used
as a mail server. On particular office (behind a NAT firewall) access' user
email relatively often (about 20 users imapd-ssl).
Every so often these users from that site stop being able to access their
email (and web services hosted on the same box sharing the same IP address).
The frequency of failures ranges from a couple of days to a couple of weeks.
It happened again this morning. On a previous occasion I was able to
determine that the problem ONLY occurs if accessing from this office (i.e.
from behind the office's NAT router), accessing the box from other IP
address's (even from the same ISP and same subnet) continued fine. The
problem was also NOT the office NAT router (confirmed by rebooting the NAT
router). Then I resolved the problem by rebooting our box in the data
centre.
With todays problem I had a little more time to investigate the problem and
was able to tie it down to the firewall on the datacentre box (shorewall
running on debian etch kernel 2.6.18-4-amd64). Restarting shorewall caused
the problem to go away.
Hypothesis
-----------------
My gut feeling is that there is a problem with shorewall / net filter.
Specifically to do with multiple simultaionious sessions FROM a given IP
address (i.e. the NAT firewall at the office in question - which by the way
is another debian box). I suspect the problem is caused by too many open
connections from a given IP (perhaps to a specific port)?
Questions
--------------
1) What logging information should I be looking at to test this
hypothesis?
2) Has anyone come across a similar problem, and if so how did you overcome
it?
Kind regards
Andy