Well, there are two obvious ways of speeding up inserting of attributes.
1. Cache the attributes, and combine them with the MD5/SHA1 signature that follows each attribute, and then do a single insert rather than an insert of the attribute followed by an update for the MD5/SHA1 signature. This would benefit all DB versions (SQLite, MySQL and PostgreSQL). Note, the user can turn off MD5 signatures so they may not always exist. Also, they are not generated for directories, and other special files.
I'll take a shot at that...
I've had a look at the code, and just had a couple of questions before I started:
It looks like I'll need to cache the file index, job id, path id,
filename id, attributes in db_create_file_attributes_record(), then pull
them out of the cache in db_add_SIG_to_file_record() and do the actual
insert (or just do both parts in catalog_update() and get_attributes_and_put_in_catalog()). It looks like fileindex is what I
need to use for a key. As each one gets inserted, it needs to come out
of the cache (or be marked as used, at least), and then at the end,
everything still in the cache can be inserted with a 0 sig to take care
of files/dirs with no sig...does that sound like about the right
process, or am I suffering from Monday morning delusions?
Each cached record is going to be 16 bytes plus the size of the
attributes, so I don't think we can cache them in memory :-) Any objections to a dbm-based cache? It would also allow a recovery
of the attributes if something dies before they get inserted...
For PostgreSQL, there is one other thing I can think of that might be done to speed things up...single sql statements are executed with an implicit BEGIN TRANSACTION/END TRANSACTION. If it's doable without affecting the other databases, would you be receptive to a patch to wrap the inserts in a transaction (probably a new transaction every 1000 or 5000 records or so). I just ran a very quick test:
CREATE TABLE testing ( num1 int8 PRIMARY KEY NOT NULL, num2 int8);
Script 1: test bare insert and update: loop from 0-1000, do an insert of num1 then update num2 with random #
testins 0.33s user 0.20s system 1% cpu 47.331 total
Script 2: wrap loop from above script with a begin/end transaction block:
testtrans 0.07s user 0.03s system 5% cpu 1.845 total
-- Matt White [EMAIL PROTECTED] Arts and Science Computer Labs University of Saskatchewan
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users