Arno Lehmann wrote:

> Hi.
>
> David Marcin wrote:
>
>> Bacula exits unexpectedly, the only thing I can think of is that the
>> database has somehow become corrupted in such a way to kill the
>> director.
>>
>> As far as I know the system has been running for about 2 months
>> unchanged, however I am not the only person to have administrator rights
>> on the machine so I cannot be certain.  I have upgraded to the latest
>> version of bacula available via debian's apt system.  Details follow.
>>
>> # bacula-dir -?
>> Copyright (C) 2000-2004 Kern Sibbald and John Walker
>>
>> Version: 1.36.2 (28 February 2005)
>>
>> And the log of the error:
>>
>> # sed 's/quateams/backups/g' file
>> # bacula-dir -f -d99
>> bacula-dir: dird.c:131 Debug level = 99
>> backups-dir: cram-md5.c:52 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backups-dir: cram-md5.c:70 Authenticate OK K4+L5x5dVisA4+Erjy4IeB
>> backups-dir: cram-md5.c:120 sending resp to challenge:
>> fUcX5UYxq5/44iNvf8/pMA
>> backups-dir: ua_run.c:481 JobType=B
>> backups-dir: job.c:108 Open database
>> backups-dir: job.c:121 DB opened
>> backups-dir: btimers.c:169 Start bsock timer 0x80d0488 tid=0x10005 for
>> 600 secs at 1115840968
>> backups-dir: cram-md5.c:120 sending resp to challenge:
>> YxpC6AB+3j/VhB04dV+H+A
>> backups-dir: cram-md5.c:52 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backups-dir: cram-md5.c:70 Authenticate OK L74xUAR2vHEaNiUN3CwJuC
>> backups-dir: btimers.c:183 Stop bsock timer 0x80d0488 tid=0x10005 at
>> 1115840969.
>> backups-dir: fd_cmds.c:87 Opened connection with File daemon
>> backups-dir: btimers.c:169 Start bsock timer 0x80d2508 tid=0x10005 for
>> 600 secs at 1115840969
>> backups-dir: cram-md5.c:120 sending resp to challenge:
>> 3kdFKAdWj6Ni2HRo10+9KA
>> backups-dir: cram-md5.c:52 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backups-dir: cram-md5.c:70 Authenticate OK CB/ligxuxF/1f/+EA4pYNC
>> backups-dir: btimers.c:183 Stop bsock timer 0x80d2508 tid=0x10005 at
>> 1115840969.
>> backups-dir: ua_status.c:104 status:status:
>> backups-dir: ua_status.c:137 do_prompt: select daemon
>> backups-dir: ua_status.c:141 item=0
>> backups-dir: ua_status.c:104 status:status:
>> backups-dir: ua_status.c:137 do_prompt: select daemon
>> backups-dir: ua_status.c:141 item=2
>> backups-dir: fd_cmds.c:87 Opened connection with File daemon
>> backups-dir: btimers.c:169 Start bsock timer 0x80d2518 tid=0xc004 for
>> 600 secs at 1115841057
>> backups-dir: cram-md5.c:120 sending resp to challenge:
>> Gy/J0W+QmSYGOy17X9dlXB
>> backups-dir: cram-md5.c:52 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backups-dir: cram-md5.c:70 Authenticate OK t9/PBGFe26Uark4WPxFeIB
>> backups-dir: btimers.c:183 Stop bsock timer 0x80d2518 tid=0xc004 at
>> 1115841058.
>> backups-dir: ua_status.c:336 Connected to file daemon
>> backups-dir: ua_status.c:104 status:status:
>> backups-dir: ua_status.c:137 do_prompt: select daemon
>> backups-dir: ua_status.c:141 item=2
>> backups-dir: fd_cmds.c:87 Opened connection with File daemon
>> backups-dir: btimers.c:169 Start bsock timer 0x80d1a10 tid=0xc004 for
>> 600 secs at 1115841069
>> backups-dir: cram-md5.c:120 sending resp to challenge:
>> oW+kAChWM5I3YUxuP//GYD
>> backups-dir: cram-md5.c:52 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backups-dir: cram-md5.c:70 Authenticate OK 0G+xN/RAFTpd7EhbDQQ/OA
>> backups-dir: btimers.c:183 Stop bsock timer 0x80d1a10 tid=0xc004 at
>> 1115841069.
>> backups-dir: ua_status.c:336 Connected to file daemon
>> backups-dir: ua_prune.c:249 select sql=SELECT JobId from Job WHERE
>> JobTDate<1113249193 AND ClientId=2 AND PurgedFiles=0
>> backups-dir: ua_prune.c:279 Delete JobId=413
>> bacula-dir: src/pager.c:570: pager_playback_one_page: Assertion
>> `pPg->nRef==0 || pPg->pgno==1' failed.
>> Aborted
>
>
> What you report is the directors log during user interaction, right?


Yes, it is the output of running bacula-dir from the command line, in
the foreground, with debug level 99 (manual said higher is better, i
figured that was pretty high ;) )

>
> What I *think* I see is that you start a job manually, and after
> selecting the client the director crashes, probably where it chooses a
> job to base a differential or incremental backup upon.

Sorry, I should have provided more details about what was going on.  I
started a job, which then went along its merry way detecting that it
should run a full backup, beginning to back up the files (a query to the
fd shows that it is indeed processing files) then at some ambiguous
point in the future before actually finishing the job and marking it in
the database, the director crashes.  Upon restarting the director this
is in the "messages" log:

11-May 23:04 backups-dir: No prior Full backup Job record found.
11-May 23:04 backups-dir: No prior or suitable Full backup found. Doing
FULL backup.
11-May 23:04 backups-dir: Start Backup JobId 553,
Job=3jane_Backup.2005-05-11_23.04.52
11-May 23:04 backups-sd: Volume "Full-0007" previously written, moving
to end of data.

>
> Now, I didn't read the source, esp. src/pager.c around lines 570, but
> for me it would be helpful to have some other information:
> - First, screenshot of your interaction,

for simplicity, here is the input/output from bconsole:
*run
Using default Catalog name=MyCatalog DB=bacula
A job name must be specified.
The defined Job resources are:
     1: 3jane Backup
     <snip - other options>
Select Job resource (1-16): 1
Run Backup job
JobName:  3jane Backup
FileSet:  3jane
Level:    Incremental
Client:   3jane-fd
Storage:  File
Pool:     Default
When:     2005-05-11 23:09:00
Priority: 10
OK to run? (yes/mod/no): yes
Job started. JobId=554
*

> - Second, the relevant configuration (client, fileset, pools, storage)

bacula-dir.conf
JobDefs {
  Name = "3jane"
  Type = Backup
  Level = Incremental
  Client = 3jane-fd
  FileSet = "3jane"
  Schedule = "WeeklyCycle"
  Storage = File
  Messages = Standard
  Pool = Default
  Full Backup Pool = Full
  Incremental Backup Pool = Incr
  Differential Backup Pool = Diff
  Priority = 10
}

Job {
  Name = "3jane Backup"
  JobDefs = "3jane"
  Write Bootstrap = "/var/lib/bacula/3jane.bsr"
}

FileSet {
  Name = "3jane"
  Include {
    Options {
      signature = MD5
    }
    File = /root
    File = /etc
    File = /home
    File = /var
  }
}

Client {
  Name = 3jane-fd
  Address = 192.168.1.101
  FDPort = 9102
  Catalog = MyCatalog
  Password = "*******"          # password for FileDaemon
  File Retention = 30 days            # 30 days
  Job Retention = 6 months            # six months
  AutoPrune = yes                     # Prune expired Jobs/Files
}

# Default pool definition
Pool {
  Name = Default
  Pool Type = Backup
  Purge Oldest Volume = yes
  Recycle = yes                       # Bacula can automatically recycle
Volumes
  Recycle Oldest Volume = yes
  Label Format = "Volume-"
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 5 days
  Accept Any Volume = yes             # write on any volume in the pool
  Maximum Volumes = 5
  Maximum Volume Jobs = 1
}

Pool {
  Name = Full
  Pool Type = Backup
  Purge Oldest Volume = yes
  Recycle = yes                       # Bacula can automatically recycle
Volumes
  Maximum Volume Jobs = 1
  Recycle Oldest Volume = yes
  Label Format = "Full-"
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 90 days
  Accept Any Volume = yes             # write on any volume in the pool
  Maximum Volumes = 10
}

Pool {
  Name = Diff
  Pool Type = Backup
  Purge Oldest Volume = yes
  Recycle = yes                       # Bacula can automatically recycle
Volumes
  Recycle Oldest Volume = yes
  Label Format = "Diff-"
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 21 days
  Accept Any Volume = yes             # write on any volume in the pool
  Maximum Volumes = 40
  Maximum Volume Jobs = 1
}

Pool {
  Name = Incr
  Pool Type = Backup
  Purge Oldest Volume = yes
  Recycle = yes                       # Bacula can automatically recycle
Volumes
  Recycle Oldest Volume = yes
  Label Format = "Incr-"
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 7 days
  Accept Any Volume = yes             # write on any volume in the pool
  Maximum Volumes = 10
  Maximum Volume Jobs = 10
}

bacula-sd.conf
Device {
  Name = FileStorage
  Media Type = File
  Archive Device = /mnt/backups/bacula/
  LabelMedia = yes;                   # lets Bacula label unlabeled media
  Random Access = Yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
}

> - What OS and version of bacula runs on the client?

The client being backed up is the same host that runs the director and
storage daemon, we have only a few computers :)  Bacula does not back up
the storage volumes.

> - Can you run other jobs on the client?
> - Can you run identical jobs on the client?

I dont understand what you mean.  Do you mean with different directors? 
I can run an estimate job on that client successfully.  I can also run
other jobs successfully from the director

> - What has the catalog about Job 413?

There is no Job 413, perhaps it was purged?  In any case I tried
specifically doing a new full backup and it still fails.


I hope that is enough information.  If we can't figure anything out I
suppose I can try purging everything and running backups from scratch again.

Thanks for your help

David


>
> If something with the database is wrong you can try to repair it.
> If something with Job 413 as a reference job is wrong, you can run a
> new full backup.
>
> Arno
>
>> It appears to be one particular backup that fails regularly.  When run
>> manually, others seem to complete, while this one fails.
>>
>> I'd rather not dump the backups that have been made, but if it is
>> necessary it can be done.
>>
>> David
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by Oracle Space Sweepstakes
>> Want to be the first software developer in space?
>> Enter now for the Oracle Space Sweepstakes!
>> http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to