On Monday 04 April 2005 18:35, Matt White wrote:
> Kern Sibbald wrote:
> > Well, there are two obvious ways of speeding up inserting of attributes.
> >
> > 1. Cache the attributes, and combine them with the MD5/SHA1 signature
> > that follows each attribute, and then do a single insert rather than an
> > insert of the attribute followed by an update for the MD5/SHA1 signature.
> > This would benefit all DB versions (SQLite, MySQL and PostgreSQL). Note,
> > the user can turn off MD5 signatures so they may not always exist. Also,
> > they are not generated for directories, and other special files.
>
> I'll take a shot at that...
>
> I've had a look at the code, and just had a couple of questions before I
> started:
>
> It looks like I'll need to cache the file index, job id, path id,
> filename id, attributes in db_create_file_attributes_record(), then pull
> them out of the cache in db_add_SIG_to_file_record() and do the actual
> insert (or just do both parts in catalog_update() and
> get_attributes_and_put_in_catalog()).  It looks like fileindex is what I
> need to use for a key.  As each one gets inserted, it needs to come out
> of the cache (or be marked as used, at least), and then at the end,
> everything still in the cache can be inserted with a 0 sig to take care
> of files/dirs with no sig...does that sound like about the right
> process, or am I suffering from Monday morning delusions?
>
> Each cached record is going to be 16 bytes plus the size of the
> attributes, so I don't think we can cache them in memory :-)  Any
> objections to a dbm-based cache?  It would also allow a recovery
> of the attributes if something dies before they get inserted...

You are close, but it is probably a lot easier than you think (once you know 
more of the details). In src/dird/catreq.c, you need only cache one copy of 
the ATTR_DBR ar record that is generated in catalog_update() before
it is sent to db_create_file_attributes_record().  Then in the code that 
checks if the signature belongs to the last ar record, you simply need to 
replace the call to db_add_SIG_to_file_record() with a call to 
db_create_file_attributes_record(). There are a few more details:

1. You must expand the attributes record to include the signature 
   see cats.h (just copy char SIG[50] from the FILE_DBR.
2. You must stuff the signature into the record.
3. You must have a flag that indicates that the record has a signature.
4. You must modify db_create_file_record() in src/cats to use the
  value of the signature in the ar packet rather than 0 when it is
  doing the insert.
5. Then instead of doing the db_add_SIG... you call 
db_create_file_attributes_record().
6. Finally, and very important, you must have some way to flush out any last 
cached attribute record if it is not followed by a signature at the end of 
the job.

The same change could be made to src/dird/fd_cmds.c -- this code is used only 
for Verify jobs, whereas the code in catreq.c is used for Backup jobs (more 
important).

>
>
> For PostgreSQL, there is one other thing I can think of that might be
> done to speed things up...single sql statements are executed with an
> implicit BEGIN TRANSACTION/END TRANSACTION.  If it's doable without
> affecting the other databases, would you be receptive to a patch to
> wrap the inserts in a transaction (probably a new transaction every
> 1000 or 5000 records or so).  I just ran a very quick test:
>
> CREATE TABLE testing (
>    num1 int8 PRIMARY KEY NOT NULL,
>    num2 int8);
>
> Script 1: test bare insert and update:
>    loop from 0-1000, do an insert of num1 then update num2 with random #
>
> testins  0.33s user 0.20s system 1% cpu 47.331 total
>
> Script 2: wrap loop from above script with a begin/end transaction
> block:
>
> testtrans  0.07s user 0.03s system 5% cpu 1.845 total

As I mentioned in a previous email, this strategy will work only if one job is 
active at a time -- for the reason, please see Martin Simmons (if I remember 
right) email on this subject.

It would be relatively easy to find out how many jobs are using the database 
at the same time since there is a reference count in the database packet. The 
problem gets slightly more complicated if one job starts a transaction, and 
another wants to start using the database -- it might not be too hard to turn 
transactions off, and block the second job until the first job noticed and 
terminated the transaction, but all that is a bit messy ...

-- 
Best regards,

Kern


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to