Hi, 02.03.2009 13:17, Tim Bell wrote: > What are the experiences of Bacula's scalability limits as the number of > files per server increase ? We are looking at backing up 1000+ clients > with millions of files in total.
Definitely a very interesting project :-) > I would like to understand if this is > feasible and how many servers we would need: I'm pretty sure it's possible as I know there are installations of that size. As a very rough estimate, I'd suggest to plan with the following servers: - one really high-end database server cluster of at least two machines - one moderately equipped DIR server. - A number of SD machines handling actual data storage. If you back up to tape, don't try to connect more tape drives to one machine that can be saturated... LTO-4 is so fast you can expect to run into bottle-necks at the SCSI/SAS/FC bus, internal buses, CPU throughput, and disk system even with a small number of drives. Disk storage systems are not that critical here (as they don't suffer from shoeshining), but obviously limited by the same factors. With multi-linked 4G FC interconnects you can do a lot :-) > > Specifically, > > - What are the recommended largest number of files in the catalog for > each bacula instance ? With version 3 (which will be released in March or April, hopefully), the catalog will get bigger fields for the IDs of some critical data. The number of files you can keep in one catalog instance will probably be sufficient then. > - What database choice is the best for large numbers of files in the > catalog ? MySQL or PostgreSQL - I'd choose whatever you're more comfortable with. I believe PostgreSQL performs a bit better, but with a catalog of the size you can expect to end up with, you definitely want someone able to handle a big database. So, if you've got good MySQL DBAs, choose that, even if performance would suggest to use PostgreSQL. You'll definitely need a good database server... either integrated to the Bacula main server, or a separate machine. For a project of your size, I would suggest to evaluate the relative speeds of a database on the Bacula server and a dedicated database server connected by 10GE or some high-speed low-latency interconnect. > - Do multiple instances of bacula on a single server make sense to > improve scalability ? No... scalability is better reached by having several separate SD machines and a separate database machine. You should be fine with one DIR. Several SDs, preferably one per network segment, allow you to run faster data transfers to the final storage. Separate DIR, SD and catalog machines, furthermore, improve the reliability a bit because if one machine fails you won't have to go through the complete procedure of a desaster recovery... running the catalog database on a cluster of at least two machines should make it highly unlikely you ever have to recover Bacula's catalog from tape or disk volumes. > Tim Bell > CERN I guess I want to visit you next time I'm in Switzerland... Bacula Systems' office is in Yverdon, not very far from CERN :-) Arno -- Arno Lehmann IT-Service Lehmann Sandstr. 6, 49080 Osnabrück www.its-lehmann.de ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users