On Thursday 04 October 2007 13:39, Scott Ruckh wrote:
> This is what you said C M Reinehr
>
> > Scott,
> >
> > I'm rather late to the party, but I have an idea that this might be a RAM
> > problem or, to be specific, that you are running out of RAM. I
> > encountered something similar a year or two ago. The good news is that
> > the solution is simply a matter of adjusting a kernel parameter. The
> > kernel parameter that I
> > had to adjust was vm.min_free_kbytes, setting it to 16224 instead of the
> > default 4060.
> >
> > Take a look at your kernel logs while the backup job is running and/or
> > run vmstat. If this is the problem you will quickly see it. I'm afraid
> > that I don't remember the details of why this worked or how I chose that
> > particular
> > value. I stumbled upon some emails discussing it after googling the
> > particular kernel error that I was receiving; made the change & then
> > forgot
> > about it.
>
> Thanks for the feedback.  This is something else to look out, although
> from a high level this would not appear to be the problem because there is
> plenty of free physical memory and when backup does not crash the system
> the swap file is never used.  On the other hand, I have run out of ideas,
> so I am willing to take a look at anything.
>
> Thanks for the reply.

Likewise, my system would hang without ever going to swap. Here's a link with 
a better explanation than I can manage: 
https://twiki.cern.ch/twiki/bin/view/LCG/ServiceChallengeTechnicalFAQ

Host Tuning 
 Network hangs due to memory starvation (Mark van de Sanden, 
[EMAIL PROTECTED]) During one of our first service challenges we suffered 
from network hangs when transfering large amounts of data. We saw this 
problem not during our ''iperf'' tests but during 
our ''globus-url-copy-tests''. With ''top'' we could see that the memory free 
was dropping to about 10MB from the about 3GB which is available on the 
system. I assume that this memory used is for file buffer cache. After a 
while (about 10 - 20 minutes) the network on the node is hang state. From the 
dmesg command we could see that the kernel tries to swap but is not able. 
After sometime of a reboot avery works again. The solution in our case was to 
limit the file buffer cache and keep memory free for tcp buffers and user 
space memory. Our rule of thumb was to leave at least 10% of physical memory 
free. In the linux kernel 2.6 you have the ''vm.min_free_kbytes'' which 
forces the kernel to keep a minimum number of kilobytes free. How this is 
forced on a 2.4 kernel I do not know. 
      Our current setting is:
      '''# sysctl vm.min_free_kbytes'''
      '''vm.min_free_kbytes = 409600'''

They are talking, here, about network problems. Here, I'm not sure if it was 
related to the network or not, but I could see by following the vmstat report 
that the longer bacula ran during my nightly backups the lower my free memory 
became until, eventually, everything just stopped.

Interestingly, I recently upgraded to the current Debian Etch from Sarge and, 
having forgotten all about this adjustment, didn't make it and haven't (yet) 
had any problems. (Actually, I did a full install, not an upgrade.)

Cheers!

cmr

-- 
Debian 'Etch' - Registered Linux User #241964
--------
"More laws, less justice." -- Marcus Tullius Ciceroca, 42 BC

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to