[Bacula-users] Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!!

Wolfgang Denk Tue, 28 Jan 2014 01:26:03 -0800

Hello,

for some time, I'm facing the situation that 2 specific backup jobs
(out of a total of some 30+) fail every now and then with the error message:


Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!!

Re-running the jobs works fine.

Backup is done to a LTO library with 2 tape drives.   Here is an example log 
that shows what is happening:

1) I start two simultaneous backup jobs.

    *run level=Incremental pool=INCR Hercules-Root
    Run Backup job
    JobName:  Hercules-Root
    Level:    Incremental
    Client:   hercules-fd
    FileSet:  Hercules Root
    Pool:     INCR (From User input)
    Storage:  LTOLIB (From Job resource)
    When:     2014-01-28 09:41:11
    Priority: 10
    OK to run? (yes/mod/no): yes
    Job queued. JobId=71138
    *run level=Incremental pool=INCR Pollux-Root
    Run Backup job
    JobName:  Pollux-Root
    Level:    Incremental
    Client:   pollux-fd
    FileSet:  Pollux Root
    Pool:     INCR (From User input)
    Storage:  LTOLIB (From Job resource)
    When:     2014-01-28 09:41:19
    Priority: 10
    OK to run? (yes/mod/no): yes
    Job queued. JobId=71139

2) The DIR decides to allocate each job to one of the two available
   tape drives.  At this point of time, the following tapes are loaded
   to the tape drives:

        drive 0: slot 26
        drive 1: slot 32

   Both tapes are in the correct pool (INCR) and have status "Append".

   JobId 71138 will use drive 0,
   JobId 71139 will use drive 1.

    28-Jan 09:41 mneme-dir JobId 71138: Start Backup JobId 71138, 
Job=Hercules-Root.2014-01-28_09.41.13_25
    28-Jan 09:41 mneme-dir JobId 71138: Using Device "LTO3-0" to write.
    28-Jan 09:41 ltos-sd JobId 71138: 3307 Issuing autochanger "unload slot 32, 
drive 1" command.
    28-Jan 09:41 mneme-dir JobId 71139: Start Backup JobId 71139, 
Job=Pollux-Root.2014-01-28_09.41.21_26
    28-Jan 09:41 mneme-dir JobId 71139: Using Device "LTO3-1" to write.
    28-Jan 09:43 ltos-sd JobId 71139: Fatal error: askdir.c:340 NULL Volume 
name. This shouldn't happen!!!
    28-Jan 09:43 ltos-sd JobId 71139: Spooling data ...
    28-Jan 09:44 ltos-sd JobId 71139: Elapsed time=00:01:23, Transfer rate=0  
Bytes/second

3) JobId 71139 fails with "NULL Volume name" error.  Note that before
   the tape originally loaded in this drive has been unloaded.

    28-Jan 09:44 pollux-fd JobId 71139: Error: bsock.c:429 Write error sending 
8 bytes to Storage daemon:ltos.denx.de:9103: ERR=Connection reset by peer
    28-Jan 09:44 pollux-fd JobId 71139: Fatal error: xattr.c:98 Network send 
error to SD. ERR=Connection reset by peer
    28-Jan 09:44 mneme-dir JobId 71139: Error: Bacula mneme-dir 5.2.13 
(19Jan13):
      Build OS:               x86_64-redhat-linux-gnu redhat 
      JobId:                  71139
      Job:                    Pollux-Root.2014-01-28_09.41.21_26
      Backup Level:           Incremental, since=2014-01-28 00:06:14
      Client:                 "pollux-fd" 5.2.13 (19Jan13) 
x86_64-redhat-linux-gnu,redhat,Cat)
      FileSet:                "Pollux Root" 2005-12-22 11:06:26
      Pool:                   "INCR" (From User input)
      Catalog:                "MyCatalog" (From Client resource)
      Storage:                "LTOLIB" (From Job resource)
      Scheduled time:         28-Jan-2014 09:41:19
      Start time:             28-Jan-2014 09:41:23
      End time:               28-Jan-2014 09:44:47
      Elapsed time:           3 mins 24 secs
      Priority:               10
      FD Files Written:       2
      SD Files Written:       0
      FD Bytes Written:       0 (0 B)
      SD Bytes Written:       0 (0 B)
      Rate:                   0.0 KB/s
      Software Compression:   None
      VSS:                    no
      Encryption:             no
      Accurate:               no
      Volume name(s):         
      Volume Session Id:      275
      Volume Session Time:    1390085323
      Last Volume Bytes:      16,678,095,872 (16.67 GB)
      Non-fatal FD errors:    1
      SD Errors:              1
      FD termination status:  Error
      SD termination status:  Error
      Termination:            *** Backup Error ***

4) The other (first started) job now also unloads the tape, and then
   load the one previously unloaded from the other drive:

    28-Jan 09:44 ltos-sd JobId 71138: 3307 Issuing autochanger "unload slot 26, 
drive 0" command.
    28-Jan 09:44 ltos-sd JobId 71138: 3304 Issuing autochanger "load slot 32, 
drive 0" command.
    28-Jan 09:45 ltos-sd JobId 71138: 3305 Autochanger "load slot 32, drive 0", 
status is OK.
    28-Jan 09:45 ltos-sd JobId 71138: Volume "INC007L3" previously written, 
moving to end of data.
    28-Jan 09:47 ltos-sd JobId 71138: Ready to append to end of Volume 
"INC007L3" at file=6.
    28-Jan 09:47 ltos-sd JobId 71138: Spooling data ...


To me it appears as if the DIR does not correctly take into account
which tape is loaded where.  It sees two tape drives, assignes one of
them for the first starting job, but then decides not to use the
currently mounted tape (which would be perfectly fine from all
criteria like Pool, Status, Use Days etc. - actually will later be
loaded into the other drive to run more jobs), but to use the one
which is currently loaded in the _other_ drive.  It starts to unload
from the other drive.

Now the second job starts running and finds that someone pulled the
tape out from under it, and it fails.


Does my interpretation make sense?

Is this a common problem, or am I doing something wrong?


The "interesting" thing is that it's always the same 2 jobs out of my
list which are candidates for this error.  And it does not always
happen - maybe 2 times per week or so...

All this is with 5.2.13 on Fedora 20 systems...


All help / ideas welcome.  Thanks in advance.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
The following statement is not true.  The previous statement is true.

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

[Bacula-users] Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!!

Reply via email to