On 03/08/10, Marc Cousin (cousinm...@gmail.com) wrote: > > > > 3. Why is Bacula using a batch file at all? Why not simply do a straight > > > > insert? > > > > > > Because 7,643,966 inserts would be much slower. > > > > Really? I've logged Bacula's performance on the server and the inserts > > run at around 0.35 ms and updates at around 0.5 ms.
> What is traced, usually, is execution time. You won't easily get : > - Parse time of the query. It is basically 0 with batch insert, where it is > very measurable with insert. > - Round trip duration and overhead. This one, even if everything is > running on the same machine, is where the costs savings are high with > batch insert : if you run everything on inserts, the inserting > process has to wait for the database to acknowledge each operation > before submitting the next one. And inserting records in bacula > isn't all about inserts. There are some selects too, to lookup for > pathid and filenameid. You also pay a penalty because you send back > data to the caller (how many inserted records and the like). > > To give you a very simplified simulation, I've tried inserting 1 million > integer > values the way the batch insert works (copy), It takes 3.5 seconds, mostly IO > bound. > > With inserts, 77s, mostly CPU bound. > > The gains are lower with bacula, because data inserted is more complex, bacula > itself is more complex, there are indexes to maintain, but it gives you an > idea > of why there is a batch mode. Actually, this is what I don't get. Postgresql is a highly scalable, robust database system and it is being used as a data dump rather than a working tool for creating a transaction-based working catalogue. Yes, a batch insert is faster than a specfic insert, but the latter should be done at the "written-to-tape" transaction time, and could be done asynchronously, but in a transaction. Its pretty crazy for a >7TB tape backup to fail because of a temporary file/table problem at the END of the backup process rather than during it. Also the copy writes to a temporary table and then some rather curious inserts are done into the Bacula tables. E.g: INSERT INTO Path (Path) SELECT a.Path FROM ( SELECT DISTINCT Path FROM batch ) AS a WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path) This is a cludge (with an inefficient correlated subquery!) that could easily miss paths which exist from previous, unrelated backups. A continuous insert process against a job and mediaid simply wouldn't need to do this. More native support for postgres would also allow, for instance, faster and more powerful searching of catalogues for retrieves, rather than the strange restore procedure required through bconsole. I'm delighted to be using Bacula (particularly after our tribulations with Amanda) but it seems to me that Bacula could lean much more heavily on Postgresql. -- Rory Campbell-Lange r...@campbell-lange.net ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users