Hello, for some time, I'm facing the situation that 2 specific backup jobs (out of a total of some 30+) fail every now and then with the error message:
Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!! Re-running the jobs works fine. Backup is done to a LTO library with 2 tape drives. Here is an example log that shows what is happening: 1) I start two simultaneous backup jobs. *run level=Incremental pool=INCR Hercules-Root Run Backup job JobName: Hercules-Root Level: Incremental Client: hercules-fd FileSet: Hercules Root Pool: INCR (From User input) Storage: LTOLIB (From Job resource) When: 2014-01-28 09:41:11 Priority: 10 OK to run? (yes/mod/no): yes Job queued. JobId=71138 *run level=Incremental pool=INCR Pollux-Root Run Backup job JobName: Pollux-Root Level: Incremental Client: pollux-fd FileSet: Pollux Root Pool: INCR (From User input) Storage: LTOLIB (From Job resource) When: 2014-01-28 09:41:19 Priority: 10 OK to run? (yes/mod/no): yes Job queued. JobId=71139 2) The DIR decides to allocate each job to one of the two available tape drives. At this point of time, the following tapes are loaded to the tape drives: drive 0: slot 26 drive 1: slot 32 Both tapes are in the correct pool (INCR) and have status "Append". JobId 71138 will use drive 0, JobId 71139 will use drive 1. 28-Jan 09:41 mneme-dir JobId 71138: Start Backup JobId 71138, Job=Hercules-Root.2014-01-28_09.41.13_25 28-Jan 09:41 mneme-dir JobId 71138: Using Device "LTO3-0" to write. 28-Jan 09:41 ltos-sd JobId 71138: 3307 Issuing autochanger "unload slot 32, drive 1" command. 28-Jan 09:41 mneme-dir JobId 71139: Start Backup JobId 71139, Job=Pollux-Root.2014-01-28_09.41.21_26 28-Jan 09:41 mneme-dir JobId 71139: Using Device "LTO3-1" to write. 28-Jan 09:43 ltos-sd JobId 71139: Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!! 28-Jan 09:43 ltos-sd JobId 71139: Spooling data ... 28-Jan 09:44 ltos-sd JobId 71139: Elapsed time=00:01:23, Transfer rate=0 Bytes/second 3) JobId 71139 fails with "NULL Volume name" error. Note that before the tape originally loaded in this drive has been unloaded. 28-Jan 09:44 pollux-fd JobId 71139: Error: bsock.c:429 Write error sending 8 bytes to Storage daemon:ltos.denx.de:9103: ERR=Connection reset by peer 28-Jan 09:44 pollux-fd JobId 71139: Fatal error: xattr.c:98 Network send error to SD. ERR=Connection reset by peer 28-Jan 09:44 mneme-dir JobId 71139: Error: Bacula mneme-dir 5.2.13 (19Jan13): Build OS: x86_64-redhat-linux-gnu redhat JobId: 71139 Job: Pollux-Root.2014-01-28_09.41.21_26 Backup Level: Incremental, since=2014-01-28 00:06:14 Client: "pollux-fd" 5.2.13 (19Jan13) x86_64-redhat-linux-gnu,redhat,Cat) FileSet: "Pollux Root" 2005-12-22 11:06:26 Pool: "INCR" (From User input) Catalog: "MyCatalog" (From Client resource) Storage: "LTOLIB" (From Job resource) Scheduled time: 28-Jan-2014 09:41:19 Start time: 28-Jan-2014 09:41:23 End time: 28-Jan-2014 09:44:47 Elapsed time: 3 mins 24 secs Priority: 10 FD Files Written: 2 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): Volume Session Id: 275 Volume Session Time: 1390085323 Last Volume Bytes: 16,678,095,872 (16.67 GB) Non-fatal FD errors: 1 SD Errors: 1 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** 4) The other (first started) job now also unloads the tape, and then load the one previously unloaded from the other drive: 28-Jan 09:44 ltos-sd JobId 71138: 3307 Issuing autochanger "unload slot 26, drive 0" command. 28-Jan 09:44 ltos-sd JobId 71138: 3304 Issuing autochanger "load slot 32, drive 0" command. 28-Jan 09:45 ltos-sd JobId 71138: 3305 Autochanger "load slot 32, drive 0", status is OK. 28-Jan 09:45 ltos-sd JobId 71138: Volume "INC007L3" previously written, moving to end of data. 28-Jan 09:47 ltos-sd JobId 71138: Ready to append to end of Volume "INC007L3" at file=6. 28-Jan 09:47 ltos-sd JobId 71138: Spooling data ... To me it appears as if the DIR does not correctly take into account which tape is loaded where. It sees two tape drives, assignes one of them for the first starting job, but then decides not to use the currently mounted tape (which would be perfectly fine from all criteria like Pool, Status, Use Days etc. - actually will later be loaded into the other drive to run more jobs), but to use the one which is currently loaded in the _other_ drive. It starts to unload from the other drive. Now the second job starts running and finds that someone pulled the tape out from under it, and it fails. Does my interpretation make sense? Is this a common problem, or am I doing something wrong? The "interesting" thing is that it's always the same 2 jobs out of my list which are candidates for this error. And it does not always happen - maybe 2 times per week or so... All this is with 5.2.13 on Fedora 20 systems... All help / ideas welcome. Thanks in advance. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de The following statement is not true. The previous statement is true. ------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users