Re: [Bacula-users] dbcheck slowness

Cedric Devillers Thu, 20 Sep 2007 02:08:58 -0700

Martin Simmons wrote:
>>>>>> On Wed, 19 Sep 2007 18:58:13 +0200, Marc Cousin said:
>> On Wednesday 19 September 2007 16:59:10 Martin Simmons wrote:
>>>>>>>> On Wed, 19 Sep 2007 11:54:37 +0200, Cousin Marc said:
>>>> I think the problem is linked to the fact dbcheck works more or less row
>>>> by row.
>>>>
>>>> If I understand correctly, the problem is that you have duplicates in the
>>>> path table as the error comes from
>>>> SELECT PathId FROM Path WHERE Path='%s' returning more than one row
>>>>
>>>> You could try this query, it would probably be much faster :
>>>>
>>>> delete from path
>>>> where pathid not in (
>>>>    select min(pathid) from path
>>>>    where path in
>>>>            (select path from path group by path having count(*) >1)
>>>>    group by path)
>>>> and path in (
>>>>    select path from path group by path having count(*) >1);
>>>>
>>>> I've just done it very quickly and haven't had time to doublecheck, so
>>>> make a backup before if you want to try it... :)
>>>> Or at least do it in a transaction so you can rollback if anything goes
>>>> wrong.
>>> Deleting from path like that could leave the catalog in a worse state than
>>> before, with dangling references in the File table.  The dbcheck routine
>>> updates the File table to replace references to deleted pathids.
>>>
>>> Moreover, if deleting duplicate pathids is slow (i.e. there are many of
>>> them), then the catalog could be badly corrupted, so I don't see how you
>>> can be sure that the File records are accurate.  It might be better to wipe
>>> the catalog and start again, or at least prune all of the file records
>>> before running dbcheck.
>>>
>> You're right, I didn't think of that problem ... I just supposed that 
>> the 'biggest' records would be there because of two transactions doing the 
>> same thing at the same time.
>>
>> Anyhow, I think we could improve dbcheck with global queries like the 
>> previous 
>> one (we can clean the file table beforehand too with one of this kind).
> 
> Right.
> 
> 
>> And even more obviously, I see that as a good reason to put integrity 
>> constraints, as it seems sometimes bacula puts junk in the database...
>> In this example, we should have, for path, a primary key on pathid and a 
>> unique not null on path. And a foreign key constraint on pathid in file ...
> 
> I think the constraints were removed because of performance problems, but
> maybe that won't be so bad with the batch insert code?
> 
> __Martin
>


I use bacula 1.38.2 and postgresql 7.4.8. Not really state of the art,
but no upgrade planned for now.

I don't know where the error come from, but apparently it has been here
for months. There is no constraint on path table.

Anyway, i've cleaned all the duplicates, and so far backup and restore
seems to be good (and no more kilometric warnings !).

It would be nice to have a more effective dbcheck (or maybe use
constraint to ensure database consistency, as you suggest). I can
understand that is not a everyday use tool, but when you need it, it
should not need one week to complete his job, because taking so long is
making it useless in production databases)

Thanks for your help.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] dbcheck slowness

Reply via email to