Hi, 05.09.2007 20:06,, Tod Hagan wrote:: > All, > > The slowdown on RHEL has returned, even after upgrading from 1.38.8 to > 2.0.3: > > JobID FileSet St T L EndTime Bytes Rate > Elapsed > 2838 raid1 Cn B F 08-Aug 13:25 332.1 GB 2476.8 KB/s 1 > day 13 hours 14 mins 59 secs > 2843 system OK B F 16-Aug 12:57 3.992 GB 11036.4 KB/s > 6 mins 1 sec > 2844 raid1 OK B F 16-Aug 21:27 1.495 TB 54848.7 KB/s > 7 hours 34 mins 18 secs > 2897 system OK B F 02-Sep 23:41 3.583 GB 2274.0 KB/s > 26 mins 12 secs > 2900 raid1 Cn B F 05-Sep 13:34 341.7 GB 2477.1 KB/s 1 > day 14 hours 19 mins 20 secs > > I don't know what has changed since job 2844 ran normally, but this > month's full backup (job 2900) is back to running very slowly (2477.1 > KB/s is 4.5% of 54848.7 KB/s). > > This never happened with RHEL 3 and its version of Postgresql. Current > system information: > > O/S: Red Hat Enterprise Linux Server release 5 (Tikanga) > Bacula: both 1.38.8 and 2.0.3 > Postgresql: postgresql-server-8.1.9-1.el5 > > I would really appreciate suggestions for diagnosing this problem, > particularly how to get additional information regarding what Bacula is > doing when it's running slowly. While I suspect the slowdown is probably > due to the Bacula/Postgresql interaction, I'm not sure how to pinpoint > that as the cause of the problem. Can Bacula be compiled with debugging > flags to produce additional logging information?
No need to compile, at least for now... use the 'setdebug' command, e.g. 'setdebug dir level=200 trace=1' and 'setdebug sd=<your_SD> level=200 trace=1' and read the resulting (large!) trace files in the working directories. Unfortunately, there are no time stamps in the log files, so it's hard to determine what actually needs so much time... Also, check what your systems are actually doing... using vmstat, top, and perhaps strace on the DIR machine might reveal where all that time goes; on the catalog database server, you should also observe PostgreSQL, but since I'm not a PostgreSQL guy, you better ask others for advice :-) > The Postgresql server is running on another computer, so using tcpdump > on the network traffic is an option as well. tcpdump could help, but I guess that would not help in actually finding out why the catalog is so slow (assuming the catalog _is_ the bottle-neck here). Arno > Thanks. > > Tod > > > On Thu, 2007-08-16 at 13:34 -0400, Tod Hagan wrote: >> All, >> >> Rather that try to figure out why 1.38.8 was running slowly after >> upgrading RHEL 3 to RHEL 5 and its newer version of Postgresql, I >> upgraded Bacula to 2.0.3. Once I got grant_postgresql_privileges running >> with help from the list, this message was reported: >> >> psql:<stdin>:62: NOTICE: number of page slots needed (29760) >> exceeds max_fsm_pages (20000) >> HINT: Consider increasing the configuration parameter >> "max_fsm_pages" to a value over 29760. >> >> I edited /var/lib/pgsql/data/postgresql.conf to set >> >> max_fsm_pages = 40000 >> >> and restarted Postgresql. >> >> A test backup shows that speeds are now comparable to the old >> configuration of 1.38.8 on RHEL 3. >> >> Even better, bconsole commands such as getting the director status or >> doing queries using sqlquery are now appreciably faster. >> >> Thanks all for your help. >> >> Tod >> -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users