I wrote: > That should fix the most common case of the multiple small jobs starting > spooling at the same time. I've applied it, and will check that this > does, in fact, happen, and report back.
It worked as expected. Multiple jobs spooling at the same time, and despooling sequentially, now have their tape usage properly reflected in the JOBMEDIA table. Here's an example: jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | endfile | startblock | endblock ------------+-------+---------+------------+-----------+-----------+---------+------------+---------- 184 | 87 | 3 | 1 | 743 | 58 | 58 | 0 | 192 185 | 88 | 3 | 1 | 103 | 58 | 58 | 193 | 843 186 | 89 | 3 | 1 | 282 | 58 | 58 | 844 | 6341 187 | 90 | 3 | 1 | 76 | 58 | 58 | 6342 | 6389 Kern Sibbald <[EMAIL PROTECTED]> writes: > Still I have some doubts about what gets put in the jobmedia for the start of > a second despooling call. That's the other failure scenario, and no, your fix doesn't address it. > The session labels don't contain any position dependent data, so that is not > important. Ah, great! I didn't read the code carefully; I assumed that since the function that creates the label copies the tape position into the dcr, it must also need it. > If you can convince/show me that something is going wrong with multiple > spoolings, then your suggestion becomes much more important. The sequence of events that causes that error is quite simple. It will "bite" when two jobs are spooling at the same time, and the one that despools first is big enough to need two rounds of spooling. I'll give a run-down of what happens (the example is from a run before your latest fix, but that fix cannot possibly have any effect, since the fix involves StartFile/StartBlock, not EndFile/EndBlock). Two jobs are shown. Job 1 is a large job from a fast host, job 2 is a somewhat smaller one from a slower host. Both start spooling at the same time, then job 1 hits its limit, and despools. By the time it is done despooling the first batch, job 2 is ready to despool. While job 2 despools, job 1 completes its second spooling run, and waits for access to the tape before despooling. The problem is in the call write_session_label() that job 1 performs at the end of its second spooling run, to generate the end label. This causes EndFile and EndBlock in job 1's dcr to be set to the current tape position (while job 2 is despooling). The reason this is a problem is that this overwrites the information about where job 1's first despooling actually finished, and this data is still needed: the last JOBMEDIA record for the first despooling run has not been written yet, but will be written when the second despooling run starts. First, the relevant rows from the FILEMEDIA table (the bogus row is the one that has jobmediaid 28): jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | endfile | startblock | endblock ------------+-------+---------+------------+-----------+-----------+---------+------------+--------- 1 | 1 | 1 | 1 | 5959 | 0 | 0 | 1 | 15499 2 | 1 | 1 | 5959 | 9885 | 1 | 1 | 0 | 15499 3 | 1 | 1 | 9885 | 15463 | 2 | 2 | 0 | 15499 4 | 1 | 1 | 15463 | 15673 | 3 | 3 | 0 | 15499 5 | 1 | 1 | 15673 | 15903 | 4 | 4 | 0 | 15499 6 | 1 | 1 | 15903 | 16161 | 5 | 5 | 0 | 15499 7 | 1 | 1 | 16161 | 16386 | 6 | 6 | 0 | 15499 8 | 1 | 1 | 16386 | 16642 | 7 | 7 | 0 | 15499 9 | 1 | 1 | 16642 | 16863 | 8 | 8 | 0 | 15499 10 | 1 | 1 | 16863 | 17048 | 9 | 9 | 0 | 15499 11 | 1 | 1 | 17048 | 17254 | 10 | 10 | 0 | 15499 12 | 1 | 1 | 17254 | 17472 | 11 | 11 | 0 | 15499 13 | 1 | 1 | 17472 | 17682 | 12 | 12 | 0 | 15499 14 | 1 | 1 | 17682 | 17904 | 13 | 13 | 0 | 15499 15 | 1 | 1 | 17904 | 21904 | 14 | 14 | 0 | 15499 16 | 1 | 1 | 21904 | 28550 | 15 | 15 | 0 | 15499 17 | 1 | 1 | 28550 | 28556 | 16 | 16 | 0 | 15499 18 | 2 | 1 | 1 | 4229 | 17 | 17 | 2756 | 15499 19 | 2 | 1 | 4229 | 4905 | 18 | 18 | 0 | 15499 20 | 2 | 1 | 4905 | 9033 | 19 | 19 | 0 | 15499 21 | 2 | 1 | 9033 | 11691 | 20 | 20 | 0 | 15499 22 | 2 | 1 | 11691 | 14411 | 21 | 21 | 0 | 15499 23 | 2 | 1 | 14411 | 15215 | 22 | 22 | 0 | 15499 24 | 2 | 1 | 15215 | 15942 | 23 | 23 | 0 | 15499 25 | 2 | 1 | 15942 | 16795 | 24 | 24 | 0 | 15499 26 | 2 | 1 | 16795 | 17586 | 25 | 25 | 0 | 15499 27 | 2 | 1 | 17586 | 17589 | 26 | 26 | 0 | 11094 28 | 1 | 1 | 28556 | 28558 | 17 | 20 | 0 | 3397 29 | 1 | 1 | 28558 | 28561 | 26 | 26 | 11095 | 15499 30 | 1 | 1 | 28561 | 28707 | 27 | 27 | 0 | 15499 31 | 1 | 1 | 28707 | 31670 | 28 | 28 | 0 | 15499 32 | 1 | 1 | 31670 | 31684 | 29 | 29 | 0 | 15499 33 | 1 | 1 | 31684 | 31812 | 30 | 30 | 0 | 15499 34 | 1 | 1 | 31812 | 32824 | 31 | 31 | 0 | 15499 35 | 1 | 1 | 32824 | 56664 | 32 | 32 | 0 | 15499 36 | 1 | 1 | 56664 | 56668 | 33 | 33 | 0 | 717 Here are some entries from the log file, with a couple of comments interspersed: 24-Mar 17:06 sirius-dir JobId 1: Start Backup JobId 1, Job=athene_User.2008-03-24_17.06.03 24-Mar 17:06 sirius-sd JobId 1: Spooling data ... 24-Mar 17:06 sirius-dir JobId 2: Start Backup JobId 2, Job=barsoom_User.2008-03-24_17.06.04 24-Mar 17:06 sirius-sd JobId 2: Spooling data ... 24-Mar 17:34 sirius-sd JobId 1: User specified spool size reached. 24-Mar 17:34 sirius-sd JobId 1: Writing spooled data to Volume. Despooling 17,179,901,776 bytes ... *** job 1 logs correct tape positions 24-Mar 17:56 sirius-sd JobId 2: Job write elapsed time = 00:50:10, Transfer rate = 3.166 M bytes/second 24-Mar 17:56 sirius-sd JobId 2: Committing spooled data to Volume "DLT0001". Despooling 9,539,116,533 bytes ... 24-Mar 18:36 sirius-sd JobId 1: Despooling elapsed time = 01:02:06, Transfer rate = 4.610 M bytes/second 24-Mar 18:36 sirius-sd JobId 1: Spooling data again ... 24-Mar 18:36 sirius-sd JobId 2: [starts physically despooling to tape] *** job 2 logs correct tape positions 24-Mar 18:46 sirius-sd JobId 1: Job write elapsed time = 01:40:20, Transfer rate = 3.901 M bytes/second *** job 1 erroneously sets end file/block in dcr to current tape position *** this is done by the write_session_label() call at stored/append.c:291 *** this will cause the old, non-flushed jobmedia record to get the wrong data when it gets flushed later 24-Mar 18:46 sirius-sd JobId 1: Committing spooled data to Volume "DLT0001". Despooling 6,331,234,514 bytes ... 24-Mar 19:08 sirius-sd JobId 2: Despooling elapsed time = 00:32:30, Transfer rate = 4.891 M bytes/second 24-Mar 19:08 sirius-sd JobId 1: [starts physically despooling to tape] *** job 1 flushes that old record, giving it the wrong end file/block from the dcr *** job 1 then logs correct tape positions 24-Mar 19:31 sirius-sd JobId 1: Despooling elapsed time = 00:22:48, Transfer rate = 4.628 M bytes/second > However, I have always been unhappy that capturing the Volume position was > done in the session label subroutine (it is sort of hidden there), so I would > be more inclined to move the end code out into a new subroutine, like I have > already done for the set_start_vol_position(), then remove it from the > session_label() subroutine, and call the two subroutines at the appropriate > times -- that makes what is actually going on much clearer. That would be the correct fix, of course. Put it on your to-do list! ;) Meanwhile, another quick fix (and a sufficient one) would be to add code to write that final JOBMEDIA row to the database at the end of the first despooling round. In the case where a large job spools in two or more rounds, this would lead to a slight inelegancy, in that there would be two rows together showing that the job used all of the tape file it was in at the intersection between spooling rounds. However, that data would still be correct, whereas the current code produces erroneous data. -tih -- Self documenting code isn't. User application constraints don't. --Ed Prochak ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel