On 6/24/22 08:08, Helpdesk - Net Products wrote:
**Problem Description**
Hello Nico, First, congratulations for winning* the contest for longest email on the Bacula Community mailing list. :) *All results are 100% unofficial, and no prizes will be delivered. :) OK, there a lot to parse here, so I think I will just pick a few things that jumped out at me to correct, and some other hints I can give to allow you to make this work successfully. These will be in no particular order. First, yes you can do Copy/Migration jobs between SDs. I noticed that in your copy destination pools on Carl, those pools use the same 'LabelFormat' as the pool that jobs would be copied from from the Alice pools. I would make these copy destination pools on Carl have a different LabelFormat for clarity. ie: Just looking at a volume's name, you will know where it lives (on Bob or on Carl) Next, and this is *VERY* important: You must have different 'MediaType' settings for th e device(s) in bob-storage and the carl-storage, and they must match the MediaType setting in the correlating "Storage{} bob-storage and carl-storage resources in the Director's config that points to them. Currently you have these all set to "MediaType = File" and this will not work. I have no idea what this sentence means: ----8<---- Both storage demons are in a file `/etc/bacula/storagedefs/file.conf` so the director over at Bob can find them and orchestrate the file transfer. ----8<---- You have not included the 'runCopyJob.sh' script, but I understand from your description what it does. There are many ways to run copy jobs, and there are several ways to tell Bacula to choose jobs to be copied. Since you are using the PoolUncopiedJobs option, then I would not trigger these Copy jobs from this script, and rather just set them to run via schedule with a different Priority than the normal backup jobs do that the Copy jobs will be queued until the backup jo bs complete. You have not shown us any job logs to know why Bacula is waiting to create a volume. We only see this on the carl-sd status: ----8<---- Device File: "carl-storage" (/zfs1/external/bob/backups) is not open. Device is BLOCKED waiting to create a volume for: Pool: alice-Inc-Pool-carl Media type: File Available Space=28.15 TB ----8<---- My guess here is that you have your 'MaximumVolumes' in your pools set too low, and somewhere in these joblogs for jobids 63554 and 63555 writing to 'carl-sd' there will be a message about Maximum Volumes in pool reached. A 'list pools' output would show us the number of volumes in each pool, and the maximum volumes set. Additionally, this issue could also be related to/caused by the MediaType = File everywhere. I do not see the need to set "MaximumConcurrentJobs = 63" in these Copy jobs. It seems pretty high, and pretty specific. I mean, sure, you can set it, but using the PoolUncop ied jobs setting, and kicking these copy jobs off daily (on a schedule if you follow my recommendations), there will never be the 63 uncopied jobs that you calculated. :) Not only that, but with MaximumConcurrentJobs = 63 in the carl-sd, and a MaximumConcurrentJObs = 20 in the carl-storage device, you will never get past 20 concurrent jobs. :) Ans, rather than using one device in your storages, I would configure Autochangers with a minimum of 10 devices in them (heck, do 20 or 30, they are free), and set MaximumConcurrentJobs = 1 on each of them. This way, each device can be reading/writing a job, and for the write jobs, there will never be more than 1 job per volume. (see below about MaximumVolumeJobs in pools) Also, since you are using the "PoolUncopiedJobs" feature, I would add the "MaximumSpawnedJobs" setting to these Copy jobs, and set it to '1' until you get everything ironed out so that if there are a lot of jobs in these pools that have not been copied, you do not end up spawning many Copy control jobs and having to cancel so may copy control jobs while you are configuring and testing. And I see you setting "MaximumVolumeJobs" in your pools. Personally, if I ever set this, I only ever set it to 1 so that I do not have more than one job on any given volume. This makes things easier for cleanup when things go wrong. :) While this setting may be useful on Tapes (maybe?), I see no reason to allow just a specific number of jobs on a volume unless it is '1'. This is just a personal preference, ymmv.
No, manually creating a volume (which is supposed to automatically create itself) does not make the problem go away.
Wee need evidence. :) Job logs, list pools, etc (but ONLY after you reconfigure everything as described above - especially the MediaTypes)
It's possible to stop the director and run it in debug mode by redirecting its output. By executing `bacula-dir -d 201 -f > /var/log/bacula/run_debug_2.log 2 &1 &` it's possible to view some more information alongside the C code.
Now we are getting into the weeds unnecessarily. Also, you can enable debugging to a *.trace file in /opt/bacula/working (the default working directory) by just doing in a bconsole session: * setdebug level=xxx trace=1 options=tc director (xxx = 100 is usually enough) Then to disable debugging: * setdebug director level=0 trace=0
Except... that results in hundreds of megabytes of logs which won't fit on this page. I'd like to know more what to look for..
Exactly. And consider that some of the decisions/work is being done on the SDs, so you would want to enable debugging on them the same way as above, substituting 'director' with 'storage=xxxx'
One thing I found earlier was that at some point the code fails due to a variable `rncj` not being high enough. I looked at the C file in question, and it contains the following snippet; bool inc_read_store(JCR *jcr) {
P(rstore_mutex); And now you have officially jumped into the deep end, which is not necessary for sure. :) I see you are running 9.6.7, which is quite old at this point. I would strongly urge you to upgrade to a current 11 version. There have been a lot of feature enhancements and fixes along the way. There is a lot to digest, here, I know. Please take a careful look at the recommendations I made above and let us know if this helps. Remember, if you are still having issues, 'status director', 'status storage=xxxx' and list pools, 'll joblog jobid=xxxx' among other things are very helpful for us to troubleshoot. And finally, Bacula Systems has a perl script that will collect your Bacula configurations and a bunch of info from the Linux system it is running on which is very helpful in debugging issues. You can download the script here: https://www.baculasystems.com/ml/bsys_report/bsys_report.tar.gz Then, you can just attach the "bsys report" rather than pasting your configurations. The resulting file is just plain text, so you can manually edit it, or use sed in-place editing to obfuscate things you'd rather keep private. ;) Hope this helps! Bill -- Bill Arlofski w...@protonmail.com
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users