Hello Kern, hello all,

Volume Bacula will want on Monday, ... The major change is a total revamp of the inner loop of the device reservation code following the algorithm proposed in a recent email. This appears to correct the problems of getting multiple autochanger drives running simultaneously, as well as several other reported problems.

perhaps I'll give it a try. But a little tale first.

We've got a HP 2/20 Library with 2 DLT-8000 drives. Our backup box is running
Debian GNU/Linux 3.0, Bacula 1.38.2 and 11 nodes. The system has gone into
production on Wednesday (with one drive) and tremendous success. Bacula is
really great.

To speed things up, I tried to activate the second drive on Thursday. I've
created a second pool and relabeled some tapes into that pool. Everything I've
found - regarding using multiple drives - says, that several pools are needed.
This were the configuration changes:

bacula-dir.conf:
Director {
   Maximum Concurrent Jobs = 2 (was 1)
}

Storage {
   Maximum Concurrent Jobs = 2 (was unset)
}

Job {} and Client {} have Maximum Concurrent Jobs unset

bacula-sd.conf:
Storage {
   Maximum Concurrent Jobs = 20 (unchanged)
}

I also put some nodes into that pool. This is what happend:

Job 1 (pool a) started to write on drive one. Job 2 (pool b) started to write
on drive two (the new one). Great. Then, job 1 finished and job three (pool a)
was started. At this time I noticed that job 2 seems to be stucked (written
blocks didn't increase any more). A little bit later job 3 was also stucked.
After 20 minutes I tried to cancel the (still stucked) jobs without success.
Thus I stoppped bacula-dir and bacula-sd which leaves two bacula-sd processes
in status D behind. They couldn't be killed so I rebooted the box. This also
failed with a booted kernel saying that init couldn't find the root partition.
After a poweroff/on the box came up as usual.

My conclusion is that the second drive is faulty and blew up the SCSI bus
(see the kernel log at the end). Job 2 was stuck at 160 MB. In the meantime
job 1 finished writing 450 MB and job 3 was started. If I remember correctly,
job 3 was able to write 2.6 GB to drive one until it also got stucked. I don't
know if a faulty tape can rise up such an incedent.

On the other hand (which is what I hope) there could be a configuration error
(Job {} and Client {} didn't have Maximum Concurrent Jobs set) or the changes
in this BETA will fix this behaviour.

I've planned to add the second drive again tomorrow and use another tape.
Should I also upgrade to 1.38.3?

Volker

Dec  9 01:18:59 backup kernel: scsi1:0:5:0: Attempting to queue an ABORT message
Dec  9 01:18:59 backup kernel: CDB: 0xa 0x0 0x0 0xfc 0x0 0x0
Dec  9 01:18:59 backup kernel: scsi1: At time of recovery, card was not paused
Dec  9 01:18:59 backup kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins 
<<<<<<<<<<<<<<<<<
Dec  9 01:18:59 backup kernel: scsi1: Dumping Card State while idle, at SEQADDR 
0x8
Dec  9 01:18:59 backup kernel: Card was paused
Dec  9 01:18:59 backup kernel: ACCUM = 0x0, SINDEX = 0x3, DINDEX = 0xe4, ARG_2 
= 0x0
Dec  9 01:18:59 backup kernel: HCNT = 0x0 SCBPTR = 0x0
Dec 9 01:18:59 backup kernel: SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0] Dec 9 01:18:59 backup kernel: LASTPHASE[0x1] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0] Dec 9 01:18:59 backup kernel: SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x0] SSTAT1[0x8] Dec 9 01:18:59 backup kernel: SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xa4] Dec 9 01:18:59 backup kernel: SXFRCTL0[0x80] DFCNTRL[0x0] DFSTATUS[0x89] Dec 9 01:18:59 backup kernel: STACK: 0x0 0x163 0x109 0x3
Dec  9 01:18:59 backup kernel: SCB count = 5
Dec  9 01:18:59 backup kernel: Kernel NEXTQSCB = 2
Dec  9 01:18:59 backup kernel: Card NEXTQSCB = 2
Dec 9 01:18:59 backup kernel: QINFIFO entries: Dec 9 01:18:59 backup kernel: Waiting Queue entries: Dec 9 01:18:59 backup kernel: Disconnected Queue entries: 1:4 Dec 9 01:18:59 backup kernel: QOUTFIFO entries: Dec 9 01:18:59 backup kernel: Sequencer Free SCB List: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Dec 9 01:18:59 backup kernel: Sequencer SCB Info: Dec 9 01:18:59 backup kernel: 0 SCB_CONTROL[0xc0] SCB_SCSIID[0x47] SCB_LUN[0x0] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 1 SCB_CONTROL[0x44] SCB_SCSIID[0x57] SCB_LUN[0x0] SCB_TAG[0x4] Dec 9 01:18:59 backup kernel: 2 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 3 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 4 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 5 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 6 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 7 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 16 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 17 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 18 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 19 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 20 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 21 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 22 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 23 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 24 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 25 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 26 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 27 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 28 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 29 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 30 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: 31 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Dec 9 01:18:59 backup kernel: Pending list: Dec 9 01:18:59 backup kernel: 4 SCB_CONTROL[0x40] SCB_SCSIID[0x57] SCB_LUN[0x0] Dec 9 01:18:59 backup kernel: Kernel Free SCB list: 3 1 0 Dec 9 01:18:59 backup kernel: Untagged Q(5): 4 Dec 9 01:18:59 backup kernel: DevQ(0:3:0): 0 waiting
Dec  9 01:18:59 backup kernel: DevQ(0:3:63): 0 waiting
Dec  9 01:18:59 backup kernel: DevQ(0:4:0): 0 waiting
Dec  9 01:18:59 backup kernel: DevQ(0:5:0): 0 waiting
Dec 9 01:18:59 backup kernel: Dec 9 01:18:59 backup kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
Dec  9 01:18:59 backup kernel: (scsi1:A:5:0): Device is disconnected, 
re-queuing SCB
Dec  9 01:18:59 backup kernel: (scsi1:A:5:0): Abort Message Sent
Dec  9 01:18:59 backup kernel: Recovery code sleeping
Dec  9 01:18:59 backup kernel: (scsi1:A:5:0): SCB 4 - Abort Completed.
Dec  9 01:18:59 backup kernel: Recovery SCB completes
Dec  9 01:18:59 backup kernel: Recovery code awake
Dec  9 01:18:59 backup kernel: aic7xxx_abort returns 0x2002


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to