On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote:
> Hello Dan,
>
> On 9/30/23 15:54, Dan Langille wrote:
>> Hello,
>> 
>> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign 
>> keys are not a new database concept; they've been around for decades. They 
>> are reliable and robust.
>
> Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is
> bigger than the benefits IMHO. Of course, if someone can run tests on a 
> multi-billion record database and show us that the cost is around few %, we 
> can
> consider them.

Here's hoping someone with that database can step up.

> Basically, each time you insert something, you need to query the other tables 
> to
> check the existence or not of records. The checks for the existence of the
> records is already done at the bacula level one time per job, no real need to 
> do
> it one time per insertion/update/delete.

Is that one-time-per-job check done via a SELECT on the database? I should 
explain analyze that one too and see if we can optimize that.

>> Wednesday, I started a dbcheck on a Bacula database. Granted, that database 
>> is 19 years old and this is the first time I've run dbcheck (as far as I 
>> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; 
>> lots of cruft removal.
>> 
>> When PostgreSQL was first added to Bacula, there was resistance to FK, and I 
>> did not pursue the issue. Thus, it persists to this day. I hope to change 
>> that.
>
> I don't think they are necessary, we use transaction in the cleanup now days, 
> so
> orphan records should not exist anymore.
>
>> I would like to take that development work back up (pun intended), and start 
>> adding foreign keys back into Bacula, at least for PostgresQL. That might 
>> remove the need for dbcheck (again, at least for Bacula on PostgreSQL).
>
> dbcheck is needed from time to time to purge the Path table mostly, other 
> tables
> should be ok with a recent version. Not sure the algorithms in dbcheck are
> always the best ones to find a remove records.
>
>> For example, one index I have been using this index for years. I find it 
>> referenced[1] in the the 5.x documentation, but it is not part of the 
>> catalog creation.
>> 
>>      "file_jobid_idx" btree (jobid)
>> 
>> This index vastly improves the construction of the files, often going from 
>> hours to seconds. I don't recall when that index was added here, but 
>> building trees has never been an issue here.
>
> This index is great, but it uses GB of space and it has a cost during the 
> insertion of the attributes. PostgreSQL can use the composed index on JobId,
> PathId instead to locate records, so you may add it back if you want, but it's
> not mandatory.

How many GB? Do we know? We can easily find out too. 

bacula=# select pg_table_size('file_jobid_idx');
 pg_table_size 
---------------
   37937438720
(1 row)

That's about 40GB.  I'll check the explain analyse and see how things go and 
come up with cost. Right now, we have no data.

Some might say: if the cost > 0 it's not worth it. However, it is worth it, 
IMO. Databases are much better at enforcing relational integrity than code is. 
It's what they do.

Thank you

-- 
  Dan Langille
  d...@langille.org


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to