Hi Alan, On Fri, Feb 01, 2013 at 02:48:29PM -0500, Alan McKay wrote: > I've been googling to find MySQL tuning instructions for bacula but > not much is coming up. Though I did find a 2 or 3 year old > discussion from this list which suggested I'd get better performance > from pgsql anyway. I certainly don't know pgsql as well as MySQL, > but I did used to know a fair bit about it and switching would not be > that big a deal to me.
I am used to do a certain tuning on databases, MySQL mostly and to a way smaller extend PostgreSQL. Because I have way more to do with MySQL and like the easy database replication there I started with MySQL in my biggest Bacula installation. Back then with a 4GB RAM machine running sd sd, dir and db. Soon my backups completed very very slow. The database had been adjusted to the host and the task. So I migrated the db to some other box with 16GB ram, at first MySQL got 8GB assigned, then 1 month later 12GB. With both values the backups became slower and slower again, despite attribute spooling. As I personally have a tendency to like postgres more than mysql as I have the feeling current postgres always does what I need right now, while only the next version mysql will do so as well, I decided to switch to postgres. postgres is set up to use 8GB of 16GB and it is now running for 15 months without issues, I have a retention time of 13 months and among the backups had been an imap server with 1 file/mail, ~350GB mails. I think the reason postgres performcs so much better in this setup is the way it handles indices. If you have colums a,b,c and do selects based on a, a,b and a,b,c with mysql there are 3 indices. With postgres you only need 1 on a,b,c as it can ignore the unneded parts. So when storing a single file info in the database, you basically need 4 writes with mysql (data, a, a,b, a,b,c) and only 2 with pgsql (data, a,b,c). With both databases I turned sync writes of, btw. My tests had been with mysql 5.0 and postgres 9.0, the database server is an old leftover machine with 16GB RAM and an unfortunate mix of storage located on an old FC-SAN as well as a local software raid-5, most of the san-storage in use by a file storage. In other words, it is not exactly IO-optimized hardware I have available. Current stats with postgres: bacula=# select count(*) from file ; count ----------- 221128055 cula=# select count(*) from path ; count -------- 818016 bacula=# select count(*) from job ; count ------- 22623 # du -s /var/lib/postgresql/9.0/main/base/ 64749160 /var/lib/postgresql/9.0/main/base/ Retention time is 13 months on tape, Full 1/month, Diff 1/week, Inc daily except an oracle DB server, there I use daily Diff on the backup areas. Daily backup is to File storage, then immidiatly copied two 2 pools of tapes, one to be kept offsite. File storage retention is 40 days for easy access to the most recent backups. > Anyone have any specific links for MySQL + Bacula tuning? No specific links, I use tuning-primer.sh and mysqltuner. With the output keep in mind you are optimizing for writes during your backup runs. > Anyone have any datapoints on MySQL vs pgsql for Bacula? In the initial setup, the 4GB machine had been to small to back up with mysql after 2 full runs, it ran each another full with 8GB and 16GB allocated for mysql. On the very same machine it runs fine with postgres and 8GB ram + iSCSI-exported disk for the single SD I use on the same server with the DIR for file/tape storage. 'To small' translates into 'all backup data is on storage, but 2 days later the attribute spooling is still running'. Yes, I use data and attribute spooling. > Will pgsql still have to be tuned? Every database has to be tuned. A database is a general purpose application, suitable for many different tasks on many different types of hardware. You always have to adjust it to your tasks. It might suffice to do very basic adjustments, e.g. open_tables matching your project and the buffer sizes matching your RAM in mysql. With postgres you can start right there with the memory adjustments, but if you really want to tune it, you can even go as far as telling postgres how much disk io costs compared to memory/cpu cycles. I have not yet done that in my postgres setup as I didn't feel the need. For my setup, here are the changed settings, postgres is 9.0 from 'backports' on debian squeeze, as said, software raid involved, 16GB host memory, 2*Dual Core Xeon 5160: data_directory = '/var/lib/postgresql/9.0/main' # use data in another directory hba_file = '/etc/postgresql/9.0/main/pg_hba.conf' # host-based authentication file ident_file = '/etc/postgresql/9.0/main/pg_ident.conf' # ident configuration file external_pid_file = '/var/run/postgresql/9.0-main.pid' # write an extra PID file listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 60 # (change requires restart) superuser_reserved_connections = 3 # (change requires restart) unix_socket_directory = '/var/run/postgresql' # (change requires restart) shared_buffers = 8192MB # min 128kB temp_buffers = 512MB # min 800kB work_mem = 32MB # min 64kB max_stack_depth = 7680kB # min 100kB effective_io_concurrency = 10 # 1-1000. 0 disables prefetching wal_level = minimal # minimal, archive, or hot_standby fsync = off # turns forced synchronization on or off synchronous_commit = off # immediate fsync at commit wal_writer_delay = 5000ms # 1-10000 milliseconds commit_delay = 10000 # range 0-100000, in microseconds commit_siblings = 5 # range 1-1000 wal_sender_delay = 5000ms # walsender cycle time, 1-10000 milliseconds log_line_prefix = '%t ' # special values: datestyle = 'iso, dmy' lc_messages = 'de_DE.UTF-8' # locale for system error message lc_monetary = 'de_DE.UTF-8' # locale for monetary formatting lc_numeric = 'de_DE.UTF-8' # locale for number formatting lc_time = 'de_DE.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.german' max_connections limits the amount of concurrent jobs you can run, if I recall correctly. I idented the not-that-performance-relevant settings by 1, so you won't get lost as easily by irrelevant things. As well I adjusted a few sysctl settings, relevant for postgres are shm*, the net* are more relevant for iSCSI and I like to avoid swapping. As well I changed the disk scheduler to 'deadline', if you have a hardware raid controller with memory cache, noop might be even better. In no case the default cfq is useful unless it is a desktop computer. I just assume you are on linux here, just because I am myself. ;) /etc/sysctl.conf: kernel.shmmax = 17179869184 kernel.shmall = 4194304 vm.swappiness = 0 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_syncookies = 1 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_moderate_rcvbuf = 1 net.core.netdev_max_backlog = 2500 Keep in mind, these settings are for the very box I am running, you need to adjust them, especially if you have a different amount of RAM. I am personally intrested in other people ways to adjust the database servers for bacula. I use bacula 5.0.2 and 5.0.3 here, the packages debian provides. Regards, Adrian -- LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart Fon: +49 (7 11) 78 28 50 90 - Fax: +49 (7 11) 78 28 50 91 Mail: li...@lihas.de - Web: http://lihas.de Linux, Netzwerke, Consulting & Support - USt-ID: DE 227 816 626 Stuttgart ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users