Arno Lehmann wrote: > Hello, > > 05.07.2007 13:07,, Alfredo Marchini wrote:: > >> Hi, >> >> Arno Lehmann wrote: >> >>> Hi, >>> >>> 05.07.2007 12:07,, Alfredo Marchini wrote:: >>> >>> >>>> Hi Arno, >>>> I don't know if you know bacula source code, >>>> >>>> >>> A bit, but I usually look for problems in the configuration as, in my >>> experience, the source code is quite stable. Of course, there are >>> bugs, but these should be reproduceable in other installations, too. >>> Unless I find a setup that looks unique to me, I'm assuming the source >>> is ok and the problem lies in the configuration or general system. >>> >>> >>> >>>> so I post you some >>>> parameters and information in my configuration that I think can cause >>>> this problem or I think is not well configured because I don't well >>>> understand the manual: >>>> >>>> Arno Lehmann wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> 04.07.2007 17:40,, Alfredo Marchini wrote:: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> The system and db logs doesn't tell me anything about this problem, like >>>>>> all the director processes or thread are locked concurrently. >>>>>> If I restart only bacula-dir without restarting bacula-sd and 16 >>>>>> bacula-fd the system restart working fine. >>>>>> >>>>>> >>>>>> >>>>> It might be possible that the DIR is busy working on the catalog (like >>>>> pruning data) and just needs more time. You can check this using >>>>> 'mysqladmin processlist', for example. >>>>> >>>>> >>>>> >>>> ok, when rehappen I'll make also this test, but if bacula makes jobs >>>> and files pruning when volumes are all used, and there are no more >>>> appendable volumes, I don't have this problem because I've got used only >>>> 10 volumes of 50Gb and have other 8 volumes avalaible and not already >>>> created. >>>> >>>> >>> Are you saying that there are always volumes available and thus no >>> pruning happens? >>> >>> >>> >> When the error occured I had 10 volumes used and 8 volumes avalaible, >> > > Good. > > >> but after 3 weeks 18 volumes are all used and then bacula makes >> recycling of the oldest that have inside the jobs older than 14 days >> (max file, job and volume retention period). >> > > Do you have reasons to assume that these retention periods are > responsible for the problem you describe? You describe them again and > again, but I still don't see where this might lead to the DIR being stuck. > >
No, I think is all ok about pool, storage, and retention periods. >> Here's my sd config: >> > > I think we agree that the most probable location of your problem is i > the DIR. > > >> Here's my dir config: >> > ... > > >> Ah, ok, so If I have 16 concurrent jobs to the sd I can set maximum >> concurrent jobs >> to sd to 16 (also on director side). Is correct? >> > > Yes. For the DIR, you should set a higher limit, because console > connections count as jobs. My advice is to actually limit the number > of concurrent jobs in the storage resource of the DIR, and simply > allow enough jobs in the SD config. > > ... > >>>> Another thing: >>>> I've setted for all fd the messages that points to the director messages. >>>> > ... > >> No, messages are correctly sent to director. >> > > So I don't think this is relevant. > > >>>> Last thing and I've got no more: >>>> >>>> If I go to working directory of bacula-dir, when is not responding, I >>>> find the files of mail that have to be send via e-mail to the operators >>>> old 2-3 days, as the bacula-dir is blocked and cannot send the e-mail >>>> (when is working fine the mail are correctly sent to all the operators). >>>> >>>> >>> Obviously, when the DIR is blocked, it will not finish jobs and thus >>> not send mail. >>> >>> Does your above statement imply that your DIR is stuck for some days, >>> when it happens? That would probably rule out catalog performance >>> issues as even an underpowered database server should finish the >>> queries after a few days... >>> >>> >>> >> Yes, I find the dir locked yesterday, but the last log, mail, and backup >> is at 30-06-2007 in the night. >> > > But there were jobs running and completed after Jun 30, but before Jul > 5, but they didn't send mail? > > Yesterday I found 2 mail file not sent. The files have the date of 30-06. And after DIR restart (now) the mail are already not sent. If I see logs the last date is 30-06. So the DIR was blocked for 4 days including 30-06 (Yesterday I restart the blocked DIR). > Also, what did your status monitors (of which you have at least 16...) > report? > Nothing about the block, and monitors are configured but I haven't a monitor always connected. Sometimes users connects to the DIR and use monitor for watching the status of fd, dir and sd. > >> After that the director is locked, and I have no more info about it. >> > > Well, it seems as though you do have lots of information, but didn't > actually collect the relevant things to understand what's happening... > > I don't find anything that can cause this problem. >> For the db I use mysql 5.0, standard rpm installation, and in the log I >> have not info about problems. >> The biggest table is the File, 832432 records,FileName are 424158, Path >> are 31414 and Log 38362. >> >> If you think I need to enlarge mysql resources, Is not a problem. >> > > You'd get error messages (at least in the log files) if the tables > were full or the disk space exhausted. > > And I've got no error messages. >> Can the problem be caused because I specified in messages this? >> > > > >> catalog = all, !skipped, !saved, !terminate >> >> this write many data to the db, if the problem is the db, I can remove >> this rule. >> > > But you told us there were no database issues... did you actually > check (with mysqladmin processlist) what the database is doing? > yes... now the database is doing nothing. there's only show processlist. > I really think you should stop looking in all directions at once. Try > to narrow down the possible reasons for your problem, instead of > looking at everything at once. > > If you can reproduce the error, create a debug log file (trace) and > observe the catalog - that's my advice, at least. > > Alternatively, when the DIR hangs again, use strace and / or gdb to > see what's happening. > > You gave good reasons to believe that your hardware, OS and database > are working correctly, so stop looking there. > > Arno > > about 2 weeks and the DIR will block again. I'll wait for it. strace has many options, which I need to specify to view all the information useful to find the problem? The real problem is that I don't know which cause can create the lock. I'm desperate because I don't have any idea, and all the configuration files seems to be well configured, or not configured to cause this problem. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users