Hello,

David Marcin wrote:
Bacula exits unexpectedly, the only thing I can think of is that the
database has somehow become corrupted in such a way to kill the
director.

Yes, it is the output of running bacula-dir from the command line, in
the foreground, with debug level 99 (manual said higher is better, i
figured that was pretty high ;) )


You can get more detailed logs... afair level 400 is / was the highest useful level. But at that level you get so much output that finding the relevant parts alone is a task in itself.


You learn much about bacula, though.

Sorry, I should have provided more details about what was going on.  I
started a job, which then went along its merry way detecting that it
should run a full backup, beginning to back up the files (a query to the
fd shows that it is indeed processing files) then at some ambiguous
point in the future before actually finishing the job and marking it in
the database, the director crashes.  Upon restarting the director this
is in the "messages" log:

Ah... good or rather bad... I assumed the error happened while preparing the backup.


11-May 23:04 backups-dir: No prior Full backup Job record found.
11-May 23:04 backups-dir: No prior or suitable Full backup found. Doing
FULL backup.
11-May 23:04 backups-dir: Start Backup JobId 553,
Job=3jane_Backup.2005-05-11_23.04.52

Ah, there's the error... you don't backup a 3Jane, you only refridgerate or clone it ;-)


...

for simplicity, here is the input/output from bconsole: *run Using default Catalog name=MyCatalog DB=bacula A job name must be specified. The defined Job resources are: 1: 3jane Backup <snip - other options> Select Job resource (1-16): 1 Run Backup job JobName: 3jane Backup FileSet: 3jane Level: Incremental Client: 3jane-fd Storage: File Pool: Default When: 2005-05-11 23:09:00 Priority: 10 OK to run? (yes/mod/no): yes Job started. JobId=554 *

Quite normal, then.




- Second, the relevant configuration (client, fileset, pools, storage)

Nothing astonishing here, too... looks quite similar to my own configuration, although I use tapes, but apart from that I think everything's ok.




- What OS and version of bacula runs on the client?



The client being backed up is the same host that runs the director and
storage daemon, we have only a few computers :) Bacula does not back up
the storage volumes.

The database?



Sorry, I meant to say here that the computer is a Debian Linux box, running the 2.4.19 kernel.

So, this is something quite common, too.

 I'm testing another Linux to Linux backup now and will let you know if that 
succeeds or fails





- Can you run other jobs on the client?
- Can you run identical jobs on the client?



I dont understand what you mean. Do you mean with different directors? I can run an estimate job on that client successfully. I can also run
other jobs successfully from the director

Sorry, I wasn't clear enough there.

I mean: Can you setup any other job, probably with a very simple configuration, to run on the same client?

And, if you copy the job setup to create a new job with a different name, does that one run?

- What has the catalog about Job 413?
There is no Job 413, perhaps it was purged?  In any case I tried
specifically doing a new full backup and it still fails.

Hm. That looks funny because that job ID was referenced in you debug output and to me it looked as though the assertion that the query has zero or one results failed...


>>>backups-dir: ua_prune.c:249 select sql=SELECT JobId from Job WHERE
>>>JobTDate<1113249193 AND ClientId=2 AND PurgedFiles=0
>>>backups-dir: ua_prune.c:279 Delete JobId=413
>>>bacula-dir: src/pager.c:570: pager_playback_one_page: Assertion
>>>`pPg->nRef==0 || pPg->pgno==1' failed.

ok, So I guess pager.c has nothing to do with database access.

Ok, I looked.
That file belongs to Sqlite. Probably it has to do with purging, then.

Things are a little more complicated than I assumed. It seems as though that assertion takes place in code that is called during playback of a transaction journal.
This should obviously only happen when the database is in a seriously damaged state.


My advice now is to first check the database using sqlites toolset (which hopefully exists...), probably to re-create the database using a dump and following load into a new database, and to check your file systems and hard disk.

In the long run, you might consider using an external database - I'm using MySQL on another machine, and I could recover from a database error without the director crashing. Of course, nothing useful could be done, but at least the director could do its work to keep the catalog consistent.



I hope that is enough information.  If we can't figure anything out I
suppose I can try purging everything and running backups from scratch again.

That should always be possible - even keeping the backup data. Stop bacula, re-create a catalog, scan the existing volumes, and voilą.


Arno

Thanks for your help

David





If something with the database is wrong you can try to repair it.
If something with Job 413 as a reference job is wrong, you can run a
new full backup.

Arno




It appears to be one particular backup that fails regularly.  When run
manually, others seem to complete, while this one fails.

I'd rather not dump the backups that have been made, but if it is
necessary it can be done.

David


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users








------------------------------------------------------- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users






-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

-- IT-Service Lehmann [EMAIL PROTECTED] Arno Lehmann http://www.its-lehmann.de


------------------------------------------------------- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to