Hello Dan,
On 10/3/23 21:01, Dan Langille wrote:
On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote:
Hello Dan,
On 9/30/23 15:54, Dan Langille wrote:
Basically, each time you insert something, you need to query the other tables to
check the existence or not of records. The checks for the existence of the
records is already done at the bacula level one time per job, no real need to do
it one time per insertion/update/delete.
Is that one-time-per-job check done via a SELECT on the database? I should
explain analyze that one too and see if we can optimize that.
Look the code, we get/create the fileset record, pool record, client, device,
... records when we start a job or load the configuration.
For Path, the query itself ensures that we have the proper records, it's
builtin.
For example, one index I have been using this index for years. I find it
referenced[1] in the the 5.x documentation, but it is not part of the catalog
creation.
"file_jobid_idx" btree (jobid)
This index vastly improves the construction of the files, often going from
hours to seconds. I don't recall when that index was added here, but building
trees has never been an issue here.
This index is great, but it uses GB of space and it has a cost during the
insertion of the attributes. PostgreSQL can use the composed index on JobId,
PathId instead to locate records, so you may add it back if you want, but it's
not mandatory.
How many GB? Do we know? We can easily find out too.
bacula=# select pg_table_size('file_jobid_idx');
pg_table_size
---------------
37937438720
(1 row)
That's about 40GB. I'll check the explain analyse and see how things go and
come up with cost. Right now, we have no data.
Some might say: if the cost > 0 it's not worth it. However, it is worth it,
IMO. Databases are much better at enforcing relational integrity than code is.
It's what they do.
When Marc Cousin and I have worked on the Batch insert procedures, we have
tested all these scenario including the FKEY, it was back in 2006 maybe.
PostgreSQL has done a lot of progress since and SSD are replacing spinning disks
for example, so it might be different, hard to say without a deep analysis with
a large data set.
Our complete analysis and wide range of tests that we did for a couple of months
ended with the code we have today.
Best Regards,
Eric
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users