Last night I had two long-running backup jobs (~1TB and ~1.2TB) fail with the same error:
Fatal error: Error writing data to spool file. ERR=Stale NFS file handle The problem is that the spool file is NOT on an NFS mounted disk! The spool file is on a GPFS partition. Spool files are limited to 200GB each, and there were two jobs running, so the 1TB partition was not full. It's difficult to deal with failures at the very end of jobs that take ~24hrs. Any suggestions? Here is an excerpt of the bacula log file for one of the jobs: ------------------- 31-Oct 17:43 sbia-infr-vbacula JobId 41350: Start Backup JobId 41350, Job=sbiaprj-1.2011-10-31_17.43.22_35 31-Oct 17:45 sbia-infr-vbacula JobId 41350: Volume "003209" previously written, moving to end of data. 31-Oct 17:47 sbia-infr-vbacula JobId 41350: Spooling data ... 31-Oct 20:56 sbia-infr-vbacula JobId 41350: User specified spool size reached. 31-Oct 20:56 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume. Despooling 214,748,386,134 bytes ... 31-Oct 21:46 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:49:21, Transfer rate = 72.52 M Bytes/second 31-Oct 21:46 sbia-infr-vbacula JobId 41350: Spooling data again ... 01-Nov 03:14 sbia-infr-vbacula JobId 41350: User specified spool size reached. 01-Nov 03:14 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume. Despooling 214,748,386,400 bytes ... 01-Nov 03:36 sbia-infr-vbacula JobId 41350: End of Volume "003209" at 135:32909 on device "ml6000-drv0" (/dev/tape0-ml6000). Write of 64512 bytes got -1. 01-Nov 03:40 sbia-infr-vbacula JobId 41350: New volume "004069" mounted on device "ml6000-drv0" (/dev/tape0-ml6000) at 01-Nov-2011 03:40. 01-Nov 04:08 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:49:52, Transfer rate = 71.77 M Bytes/second 01-Nov 04:08 sbia-infr-vbacula JobId 41350: Spooling data again ... 01-Nov 07:04 sbia-infr-vbacula JobId 41350: User specified spool size reached. 01-Nov 07:04 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume. Despooling 214,748,386,354 bytes ... 01-Nov 08:02 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:56:51, Transfer rate = 62.95 M Bytes/second 01-Nov 08:02 sbia-infr-vbacula JobId 41350: Spooling data again ... 01-Nov 12:01 sbia-infr-vbacula JobId 41350: User specified spool size reached. 01-Nov 12:01 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume. Despooling 214,748,383,710 bytes ... 01-Nov 12:55 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:54:15, Transfer rate = 65.97 M Bytes/second 01-Nov 12:55 sbia-infr-vbacula JobId 41350: Spooling data again ... 01-Nov 18:06 sbia-infr-vbacula JobId 41350: User specified spool size reached. 01-Nov 18:06 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume. Despooling 214,748,385,576 bytes ... 01-Nov 18:49 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:43:17, Transfer rate = 82.69 M Bytes/second 01-Nov 18:50 sbia-infr-vbacula JobId 41350: Spooling data again ... 01-Nov 21:33 sbia-infr-vbacula JobId 41350: Fatal error: Error writing data to spool file. ERR=Stale NFS file handle 01-Nov 21:33 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume. Despooling 95,632,760,393 bytes ... 01-Nov 21:34 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:00:01, Transfer rate = 95.63 G Bytes/second 01-Nov 21:34 sbia-infr-vbacula JobId 41350: Fatal error: Fatal despooling error.01-Nov 21:34 sbia-infr-vbacula JobId 41350: Job write elapsed time = 27:43:21, Transfer rate = 11.70 M Bytes/second 01-Nov 21:34 sbia-infr3 JobId 41350: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer 01-Nov 21:34 sbia-infr-vbacula JobId 41350: Error: Bacula sbia-infr-vbacula 5.0.2 (28Apr10): 01-Nov-2011 21:34:27 Build OS: x86_64-redhat-linux-gnu redhat JobId: 41350 Job: sbiaprj-1.2011-10-31_17.43.22_35 Backup Level: Full Client: "sbia-infr-vnfs1" 5.0.2 (28Apr10) x86_64-redhat-linux-gnu,redhat, FileSet: "sbiaprj-1" 2011-10-12 23:15:00 Pool: "Full" (From Job FullPool override) Catalog: "MyCatalog" (From Client resource) Storage: "ml6000" (From Job resource) Scheduled time: 31-Oct-2011 17:43:13 Start time: 31-Oct-2011 17:45:06 End time: 01-Nov-2011 21:34:27 Elapsed time: 1 day 3 hours 49 mins 21 secs Priority: 10 FD Files Written: 3,257,746 SD Files Written: 3,257,744 FD Bytes Written: 1,167,407,287,207 (1.167 TB) SD Bytes Written: 1,168,242,527,818 (1.168 TB) Rate: 11655.3 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): 003209|003037|004069 Volume Session Id: 142 Volume Session Time: 1319573487 Last Volume Bytes: 755,423,843,328 (755.4 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** -------------------------------------- Thanks, Mark ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now! http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users