On Thursday 23 March 2006 10:06, Wolfgang Denk wrote:
> Hello,
>
> can somebody share some hint what the follwing error  messages  mean,
> what the actual problem is and how these problems can be avoided?
>
> ...
> 23-Mar 07:00 hydra3-sd: End of Volume "K-I-3" at 79:13683 on device
> "SLR100-2" (/dev/nst0). Write of 64512 bytes got -1. 23-Mar 07:00
> hydra3-sd: Pollux-Other.2006-03-22_23.08.11 Error: Re-read of last block
> failed. Last block=230095 Current block=352546. 23-Mar 07:00 hydra3-sd: End
> of medium on Volume "K-I-3" Bytes=79,744,352,918 Blocks=1,236,119 at
> 23-Mar-2006 07:00. ...
> 23-Mar 09:56 hydra3-sd: End of Volume "K-I-4" at 57:775 on device
> "SLR100-2" (/dev/nst0). Write of 64512 bytes got -1. 23-Mar 09:57
> hydra3-sd: Pollux-Other.2006-03-22_23.08.11 Error: Re-read of last block
> failed. Last block=606137 Current block=860778. 23-Mar 09:57 hydra3-sd: End
> of medium on Volume "K-I-4" Bytes=57,046,341,076 Blocks=884,275 at
> 23-Mar-2006 09:57. ...
>
> This is with Bacula 1.38.5 (18 January 2006) on Fedora Core  2  (DIR,
> FD) and 4 (SD) systems. At the moment of the problem two simultaneous
> backup jobs were writing to these tapes.
>
> I'm surprised about the  big  difference  between  "Last  block"  and
> "Current  block".  What does that mean? Is this backup still good, or
> must I fear to lose data when trying to restore from this archive?

I've taken a look at the code, and I am 99% convinced that this is a false 
alert (a Bacula bug).  What has happened is the following:

1. Your two jobs (say 1 and 2) are writing on the tape, and you are surely not 
spooling or this is much less likely to happen, which means the blocks for 
the two jobs are getting intermingled (inefficient, but no problem).

2. Job 1 writes a block and it succeeds, but totally fills the tape.

3. Job 2 attempts to write a block but it fails, so it writes a final EOF (or 
two depending on your setting), backspaces over the EOF mark(s), 
backspaces over the last valid block that was written (by job 1).

4. Job 2 then successfully re-reads this last block and proceeds to
check the block numbers -- i.e. it wants to know if the block just re-read is 
the last block it wrote.

5. Now here is where the false problem occurs.  The code expects the block 
number to be the previous block for Job 2, but in fact the last block was 
written by Job 1, hence the block numbers do not agree and the SD incorrectly 
reports an error.

Now I need to think about how to resolve this. One way is simply remove the 
test for the block number, but the test is perfectly OK if only one job is 
writing, or if multiple jobs are writing and the last block was written by 
the job that hits the end of tape.  At a minimum, I'll clarify the message 
(already done).   I've also downgraded the message from an ERROR to a 
WARNING.  In particular, I'm going to see if the job can check if the last 
block was written by it or another job and report accordingly.  I don't want 
to complicate the code too much though ...

The output from the bls and the corresponding Job report messages when the job 
hit the end of the tape will allow me to confirm this theory (I'm 99% sure).

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to