FYI, I haven't had time to look into it much, but I have been seeing errors
with my auto changer since 1.38.1 that I had never seen with 1.36.* before
that look a lot like these. As Kern said, as if something seems to be
missing from the log, see:

04-Dec 03:34 bug-sd: End of Volume "NJO008D" at 80:11492 on device "Drive-1"
(/dev/nst0). Write of 64512 bytes got -1.
04-Dec 03:35 bug-sd: spider.2005-12-04_03.05.04 Error: Re-read of last block
failed. Last block=80530 Current block=14717.
04-Dec 03:35 bug-sd: End of medium on Volume "NJO008D" Bytes=45,428,287,520
Blocks=704,222 at 04-Dec-2005 03:35.
04-Dec 03:35 bug-sd: 3301 Issuing autochanger "loaded drive 0" command.
04-Dec 03:35 bug-sd: 3302 Autochanger "loaded drive 0", result is Slot 8.
04-Dec 03:35 bug-sd: 3307 Issuing autochanger "unload slot 8, drive 0"
command.
04-Dec 03:35 bug-sd: 3995 Bad autochanger "unload slot 9, drive 0":
ERR=Child exited with code 1.
04-Dec 03:35 bug-sd: Please mount Volume "NJO009D" on Storage Device
"Drive-1" (/dev/nst0) for Job spider.2005-12-04_03.05.04

Rob

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kern Sibbald
Sent: Monday, December 12, 2005 9:20 AM
To: bacula-users@lists.sourceforge.net
Cc: Volker Dierks
Subject: Re: [Bacula-users] Bacula BETA 1.38.3

On Monday 12 December 2005 12:52, Volker Dierks wrote:
> Hello,
>
> Volker Dierks wrote:
> >> Usually, I'd see if the problem can be reproduced with the existing
> >> system setup. If that's possible, I'd first check if the actual cause
> >> might be purely SCSI device related.
> >
> > That's what I'm going to do first. I'll create the second pool again
> > (with the same tapes) and put all nodes into that pool ...
>
> I've done this tonight .. in turn:
> - the backup up started as planned on drive two with the same tape as
>   Thursday (the tape was already mounted so no mtx stuff take place)
> - after some minutes (and 500 MB written data on that tape) everything
>   hangs again .. so I restarted everything and disabled that tape
> - I mounted the next tape and started the backup again. After 7 GB of
>   written data to that tape (and 5 successful backuped nodes) I got to
>   bed.
>
> Until here, it lookes like the problems were truly caused by the tape.
> But this morning I got the following mail:
> 12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: block.c:538 Write
> error at 12:5438 on device "Drive-2" (/dev/nst1). ERR=Input/output error.
> 12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: Error writing
> final EOF to tape. This Volume may not be readable. dev.c:1553 ioctl
MTWEOF
> error on "Drive-2" (/dev/nst1). ERR=No such device or address. 12-Dec
03:24

Unless you have 7GB tapes, this looks like a hardware problem: bad media, 
dirty tape drive, bad drive, bad SCSI cables (or improperly installed), bad 
SCSI card, ...

These kinds of problems typically generate a number of kernel (SCSI)
messages 
in the log.

> mw-mcs-sd: End of medium on Volume "MW-MCS-1-12" Bytes=7,078,064,979
> Blocks=109,722 at 12-Dec-2005 03:24. 12-Dec 03:24 mw-mcs-sd: 3301 Issuing
> autochanger "loaded drive 1" command. 12-Dec 03:24 mw-mcs-sd: 3302
> Autochanger "loaded drive 1", result is Slot 12. 12-Dec 04:10 mw-mcs-sd:
> 3307 Issuing autochanger "unload slot 12, drive 1" command. 12-Dec 04:14
> mw-mcs-sd: 3995 Bad autochanger "unload slot 13, drive 1": ERR=Child died
> from signal 15: Termination. 

This looks like you don't have your autochanger script properly configured
as 
one user pointed out -- setting the sleep longer may help.  However, I do
not 
understand why in one message it says "unload slot 12", then on the next
line 
it says "unload slot 13 ... ERR".  There seems to be something missing as 
Bacula will normally issue a "loaded drive" or load a drive before unloading

it for a second time.

> 12-Dec 04:14 mw-mcs-sd: Please mount Volume 
> "MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
> nfs-1.2005-12-12_02.15.08 12-Dec 05:14 mw-mcs-sd: Please mount Volume
> "MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
> nfs-1.2005-12-12_02.15.08 12-Dec 07:14 mw-mcs-sd: Please mount Volume
> "MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
> nfs-1.2005-12-12_02.15.08 12-Dec 08:59 nfs-1-fd: nfs-1.2005-12-12_02.15.08
> Fatal error: backup.c:498 Network send error to SD. ERR=Broken pipe 12-Dec
> 08:59 mw-mcs-dir: nfs-1.2005-12-12_02.15.08 Error: Bacula 1.38.2
(20Nov05):
> 12-Dec-2005 08:59:32
>
> At 08:59 I stopped bacula-dir and -sd. The kernel-Log contains the
> same SCSI ABORT messages as posted before starting at 02:54:
> Dec 12 02:54:30 backup kernel: scsi1:0:5:0: Attempting to queue an ABORT
> message

If you are getting SCSI ABORT messages, then either there is some hardware 
problem or the Bacula Device resource is not setup right for that drive.

Did you pass *all* the tests in the Tape Testing chapter?

>
> The last thing I can imagine is: All tapes which were used in Drive-2
> up to now are previously used (by amanda). This is the way I recycled
> them:
> mt -f /dev/nst1 rewind
> mt -f /dev/nst1 setdensity 0x89

I always find explicitly setting the density this way *very* prone to error.

> mt -f /dev/nst1 rewind
> mt -f /dev/nst1 weof
> mt -f /dev/nst1 weof
> write the Bacula label
>
> Perhaps this is not the right way? I've attached our configartion and
> would be very thankful, if someone can confirm that it's correct. It's
> the one drive configuration pointing to Pool: DRIVE-2. When using this
> configuration against Pool: DRIVE-1 (all tapes in this pool are fresh
> new ones) everything is working fine.
>
> Volker
>
> PS: I'm running "mt -f /dev/nst1 erase" on MW-MCS-1-12 atm. If this
>     fails, I would say that drive two is faulty.
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to