OK, this is getting more and more peculiar as I study it more. Adding bacula-devel list.
To briefly recap the initial statement of the problem, I've been experiencing a problem in which, after a number of successful connections, console->Director connection authentication begins repeatedly failing. Everything else seems to continue to work normally. The typical behavior is that after manually starting two or three jobs using BAT, I can no longer connect to the Director either with BAT or with bconsole, but everything else continues to function normally and the scheduled jobs run normally. After the pending manually-scheduled jobs complete, I can connect again. On the theory that network bandwidth may be somehow involved, I tried scheduling several jobs 15 minutes ahead of time, to see if I could get more jobs running if I scheduled them all before any started. Starting at about 0915, schedule job 1 for 0925. No problem. Schedule Job 2 for 0925. No problem. Schedule job 3 for 0925. No problem. At about 0918, try to schedule job 4 for 0925. None of the new jobs has yet started. No go; neither bat nor bconsole can connect. This is what the trace logged as I tried to connect with bconsole: babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131 babylon4-dir: job.c:1331-0 wstorage=babylon5-sd babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource babylon4-dir: job.c:1031-0 JobId=0 created Job=-Console-.2012-03-07_09.19.16_37 babylon4-dir: cram-md5.c:72-0 send: auth cram-md5 <1723850907.1331129956@babylon4-dir> ssl=0 babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5 <85736557.1331129966@bat> ssl=0 babylon4-dir: cram-md5.c:150-0 sending resp to challenge: 25Q2B+IdJ/UKI/+p6++vkC babylon4-dir: ua_dotcmds.c:164-0 Cmd: .api 1 babylon4-dir: ua_dotcmds.c:164-0 Cmd: .levels Backup babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131 babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131 The console reported: babylon4:root:/opt/bacula/etc:29 # bconsole Connecting to Director babylon4:9101 Director authorization problem. Most likely the passwords do not agree. If you are using TLS, there may have been a certificate validation error during the TLS handshake. After restarting the Director, I re-enabled the trace (setdebug director level=100 trace=1), then reconnected again with bconsole: babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131 babylon4-dir: job.c:1331-0 wstorage=babylon5-sd babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource babylon4-dir: job.c:1031-0 JobId=0 created Job=-Console-.2012-03-07_09.32.59_04 babylon4-dir: cram-md5.c:72-0 send: auth cram-md5 <1031666935.1331130779@babylon4-dir> ssl=0 babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5 <41725829.1331130779@bconsole> ssl=0 babylon4-dir: cram-md5.c:150-0 sending resp to challenge: 6Sgw8g8aLxgeAEx5CwsU1B This looks no different to me than the failed connection attempt. So I tried starting up bconsole from the Linux machine I'm running bat on. That worked fine, so I quit it and started another. I did this about five times. Then I started six at once. No problem. It appears I can connect as many consoles as I want, up to the Director's configured concurrency limit, with no problem ... until I start scheduling jobs. So, then I opened a bconsole and left it open, then scheduled two jobs from BAT successfully. Then I tried to schedule a third. No go. At this point, I tried to open an additional new bconsole. No go, and the trace *did not log anything* for the connection attempt. I could continue to schedule more manual jobs from the existing open bconsole, but could start no new consoles, and BAT became completely unresponsive. It appears that once two or three jobs were scheduled, the Director *stopped listening* for new console connections, but continued to service existing open consoles. All daemons are Bacula 5.2.5, all 64-bit builds. The Director and the disk-based SD are running on Solaris 10u9 amd64, built using Sun Studio 12.2. The tape SD is running on Gentoo Linux amd64, built using gcc-4.5.3. BAT runs on the Linux box, and I used bconsoles from both machines with no difference in behavior. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users