Hello Dan,
On 9/30/23 15:54, Dan Langille wrote:
Hello,
The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign keys
are not a new database concept; they've been around for decades. They are
reliable and robust.
Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is
bigger than the benefits IMHO. Of course, if someone can run tests on a
multi-billion record database and show us that the cost is around few %, we can
consider them.
Basically, each time you insert something, you need to query the other tables to
check the existence or not of records. The checks for the existence of the
records is already done at the bacula level one time per job, no real need to do
it one time per insertion/update/delete.
Wednesday, I started a dbcheck on a Bacula database. Granted, that database is
19 years old and this is the first time I've run dbcheck (as far as I know).
That dbcheck is still going. FYI, the dump to disk is about 140GB; lots of
cruft removal.
When PostgreSQL was first added to Bacula, there was resistance to FK, and I
did not pursue the issue. Thus, it persists to this day. I hope to change that.
I don't think they are necessary, we use transaction in the cleanup now days, so
orphan records should not exist anymore.
I would like to take that development work back up (pun intended), and start
adding foreign keys back into Bacula, at least for PostgresQL. That might
remove the need for dbcheck (again, at least for Bacula on PostgreSQL).
dbcheck is needed from time to time to purge the Path table mostly, other tables
should be ok with a recent version. Not sure the algorithms in dbcheck are
always the best ones to find a remove records.
For example, one index I have been using this index for years. I find it
referenced[1] in the the 5.x documentation, but it is not part of the catalog
creation.
"file_jobid_idx" btree (jobid)
This index vastly improves the construction of the files, often going from
hours to seconds. I don't recall when that index was added here, but building
trees has never been an issue here.
This index is great, but it uses GB of space and it has a cost during the
insertion of the attributes. PostgreSQL can use the composed index on JobId,
PathId instead to locate records, so you may add it back if you want, but it's
not mandatory.
I have no timeline for this work, but just posting the intent often gives me
the incentive to get started.
1 - https://www.bacula.org/5.2.x-manuals/en/main/main/Catalog_Maintenance.html
Best Regards,
Eric
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users