Hi all.

I've run into a situation that leaves me scratching my head.  I'm looking for 
any thoughts that might help identify the cause of the problem.

A contact of mine is having odd network problems.  The network runs fine for a 
bit then they have periods of slow response and/or domain resolution 
problems.  Then the network returns to normal on it's own.

The network is a mixed environment, with IPCop for the firewall, and a Linux 
server doing web gateway duties.  My contact has spoken with Telus (their 
ISP), who pretty much said the problem was an internal issue.  I'm not so 
sure of this, but haven't ruled out the possibility.

We have tried changing the name servers on the IPCop firewall, and changing 
the internal DHCP config to use the firewall as the primary name server 
(instead of the W2K Active Directory box with DNS).  The problems persist.  
We have done some simple analysis of the network traffic using EtherApe, and 
identified a couple of workstations that were using excessive bandwidth 
(LimeWire).  We had these workstations shut down the procees causing the 
excessive bandwidth, and the problem persists.  A quick scan of traffic using 
Ethereal indicated normal traffic.  

During the troubleshooting process, I had the issue crop up and www.google.ca 
could not be resolved.  I immediately opened a command shell and tried an 
nslookup on www.google.ca.  nslookup reported that it could not connect to 
the name server by name, which was odd because it only knew the IP address so 
established the connection anyways.  Sure enough asking for details for 
google reported no results found.  A few minutes later, this process worked 
perfectly fine.

So, my gut tells me that we have a name resolution problem, but the steps 
we've taken to rectify this don't seem to be making any difference.  Seeing 
as everything works fine for at least some time, I'm assuming the switches 
involved are good.  (The switches are a combination of 10/100 and Gigabit 
Ethernet - 3Com switches with gigabit modules, and a fiber module for 
connecting a second building).  As near as I can tell, the network 
architecture is within usual standards.

My next step is to grab an extended capture session using tcpdump or Ethereal, 
and track DNS traffic in detail.  We may also try setting up an internal Bind 
server to see if this helps.  Any other suggestions what we can try?

Because it is intermittent, and doesn't seem to be following any pattern, this 
problem is proving difficult to nail down.  Hmmm...  This does sound like a 
possible hardware issue, but that shouldn't be the case....

Thanks for any tips.  I can offer a little more about the architecture if 
needed....

Shawn

_______________________________________________
clug-talk mailing list
[email protected]
http://clug.ca/mailman/listinfo/clug-talk_clug.ca
Mailing List Guidelines (http://clug.ca/ml_guidelines.php)
**Please remove these lines when replying

Reply via email to