On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote: > Hello Dan, > > On 9/30/23 15:54, Dan Langille wrote: >> Hello, >> >> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign >> keys are not a new database concept; they've been around for decades. They >> are reliable and robust. > > Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is > bigger than the benefits IMHO. Of course, if someone can run tests on a > multi-billion record database and show us that the cost is around few %, we > can > consider them.
Here's hoping someone with that database can step up. > Basically, each time you insert something, you need to query the other tables > to > check the existence or not of records. The checks for the existence of the > records is already done at the bacula level one time per job, no real need to > do > it one time per insertion/update/delete. Is that one-time-per-job check done via a SELECT on the database? I should explain analyze that one too and see if we can optimize that. >> Wednesday, I started a dbcheck on a Bacula database. Granted, that database >> is 19 years old and this is the first time I've run dbcheck (as far as I >> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; >> lots of cruft removal. >> >> When PostgreSQL was first added to Bacula, there was resistance to FK, and I >> did not pursue the issue. Thus, it persists to this day. I hope to change >> that. > > I don't think they are necessary, we use transaction in the cleanup now days, > so > orphan records should not exist anymore. > >> I would like to take that development work back up (pun intended), and start >> adding foreign keys back into Bacula, at least for PostgresQL. That might >> remove the need for dbcheck (again, at least for Bacula on PostgreSQL). > > dbcheck is needed from time to time to purge the Path table mostly, other > tables > should be ok with a recent version. Not sure the algorithms in dbcheck are > always the best ones to find a remove records. > >> For example, one index I have been using this index for years. I find it >> referenced[1] in the the 5.x documentation, but it is not part of the >> catalog creation. >> >> "file_jobid_idx" btree (jobid) >> >> This index vastly improves the construction of the files, often going from >> hours to seconds. I don't recall when that index was added here, but >> building trees has never been an issue here. > > This index is great, but it uses GB of space and it has a cost during the > insertion of the attributes. PostgreSQL can use the composed index on JobId, > PathId instead to locate records, so you may add it back if you want, but it's > not mandatory. How many GB? Do we know? We can easily find out too. bacula=# select pg_table_size('file_jobid_idx'); pg_table_size --------------- 37937438720 (1 row) That's about 40GB. I'll check the explain analyse and see how things go and come up with cost. Right now, we have no data. Some might say: if the cost > 0 it's not worth it. However, it is worth it, IMO. Databases are much better at enforcing relational integrity than code is. It's what they do. Thank you -- Dan Langille d...@langille.org _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users