I've had a chance to do some rough testing to see how expanding the `lstat' field might affect insert times.
I've gathered insert times for original `file' table with base64-ish encoded lstat, and new table with expanded lstat. Tests for all databases were done from the same SQL file, which contains INSERT statements without a column list, one statement per record. The whole batch is done in one transaction - BEGIN is issued as the first statement in the file, and COMMIT as the last. Times in seconds, lower is better. Note that the b64 form is pre-encoded in the dump, so these results DO NOT CONSIDER THE TIME COST OF BASE64-ENCODING THE ORIGINAL LSTAT STRUCTURE: DBMS b64 expanded diff %diff PostgreSQL 1011 1290 279 27 MySQL5-MyISAM 731 846 115 16 MySQL5-InnoDB 967 1167 200 21 SQlite3 294 371 77 26 (These are not formal benchmarks. They test only one particular workload and are not representative of general database performance.) The base64-lstat SQL insert file is 1734Mb and the expanded format one is 2189Mb. The expanded dump is about 25% larger due to the numeric representation, which roughly matches the performance hit on INSERT. I wouldn't be too surprised if most of this was parsing time, given the need to convert textual number representations into binary. I'm curious about what causes this slowdown, actually. Is it parsing the SQL and converting the text respresentations of the numbers? Is it something to do with the extra fields in the schema (even though the row width lands up being much the same) ? I'm going to poke the PostgreSQL folks about this, but I'd be interested in thoughts from here too. Also ... holy *crap*, SQLite3 is fast at simple bulk-loads with no concurrency/contention. Note that my MySQL and Pg instances are largely un-tuned, and they're running on a plain old 7200rpm disk. All I did for Pg was to set checkpoint_segments to 9 and disable full_page_writes. In particular, the WAL is on the same drive as the main Pg database, which should not be the case for any Pg setup where you really care about performance. No tuning was done at all on the other DBs, since I just don't know them well enough; they're installed with Ubuntu defaults. Given that the input SQL file alone is 2GB, and my laptop only has 4GB of RAM, caching effects probably won't be too big a deal. Despite that, I pre-read the file ("cp file.sql /dev/null") before each test. These tests aren't *that* rigourous; I haven't re-run them multiple times, run them on different machines, etc. However, given the extended run-time of the tests, minor random effects due to other things happening at the same time should pretty much average out, and we don't _need_ things to be all that precise anyway. b64-Pg: real 16m51.416s user 2m23.077s sys 1m12.505s expanded-Pg: real 21m30.639s user 2m54.755s sys 1m13.185s b64-MySQL-MyISAM: real 12m11.791s user 1m50.959s sys 1m7.968s expanded-MySQL-MyISAM: real 14m8.387s user 2m9.892s sys 1m11.008s b64-MySQL-InnoDB: real 16m7.520s user 1m44.667s sys 1m2.768s expanded-MySQL-InnoDB: real 19m27.605s user 2m16.465s sys 1m12.693s b64-sqlite: real 4m54.635s user 4m27.153s sys 0m11.061s expanded-sqlite: real 6m11.574s user 5m38.573s sys 0m11.101s -- Craig Ringer ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users