Hello, 05.07.2007 13:07,, Alfredo Marchini wrote:: > Hi, > > Arno Lehmann wrote: >> Hi, >> >> 05.07.2007 12:07,, Alfredo Marchini wrote:: >> >>> Hi Arno, >>> I don't know if you know bacula source code, >>> >> A bit, but I usually look for problems in the configuration as, in my >> experience, the source code is quite stable. Of course, there are >> bugs, but these should be reproduceable in other installations, too. >> Unless I find a setup that looks unique to me, I'm assuming the source >> is ok and the problem lies in the configuration or general system. >> >> >>> so I post you some >>> parameters and information in my configuration that I think can cause >>> this problem or I think is not well configured because I don't well >>> understand the manual: >>> >>> Arno Lehmann wrote: >>> >>>> Hi, >>>> >>>> 04.07.2007 17:40,, Alfredo Marchini wrote:: >>>> >>>> >>>>> Hi, >>>>> The system and db logs doesn't tell me anything about this problem, like >>>>> all the director processes or thread are locked concurrently. >>>>> If I restart only bacula-dir without restarting bacula-sd and 16 >>>>> bacula-fd the system restart working fine. >>>>> >>>>> >>>> It might be possible that the DIR is busy working on the catalog (like >>>> pruning data) and just needs more time. You can check this using >>>> 'mysqladmin processlist', for example. >>>> >>>> >>> ok, when rehappen I'll make also this test, but if bacula makes jobs >>> and files pruning when volumes are all used, and there are no more >>> appendable volumes, I don't have this problem because I've got used only >>> 10 volumes of 50Gb and have other 8 volumes avalaible and not already >>> created. >>> >> Are you saying that there are always volumes available and thus no >> pruning happens? >> >> > When the error occured I had 10 volumes used and 8 volumes avalaible,
Good. > but after 3 weeks 18 volumes are all used and then bacula makes > recycling of the oldest that have inside the jobs older than 14 days > (max file, job and volume retention period). Do you have reasons to assume that these retention periods are responsible for the problem you describe? You describe them again and again, but I still don't see where this might lead to the DIR being stuck. > Here's my sd config: I think we agree that the most probable location of your problem is i the DIR. > Here's my dir config: ... > Ah, ok, so If I have 16 concurrent jobs to the sd I can set maximum > concurrent jobs > to sd to 16 (also on director side). Is correct? Yes. For the DIR, you should set a higher limit, because console connections count as jobs. My advice is to actually limit the number of concurrent jobs in the storage resource of the DIR, and simply allow enough jobs in the SD config. ... >>> Another thing: >>> I've setted for all fd the messages that points to the director messages. ... > No, messages are correctly sent to director. So I don't think this is relevant. >>> Last thing and I've got no more: >>> >>> If I go to working directory of bacula-dir, when is not responding, I >>> find the files of mail that have to be send via e-mail to the operators >>> old 2-3 days, as the bacula-dir is blocked and cannot send the e-mail >>> (when is working fine the mail are correctly sent to all the operators). >>> >> Obviously, when the DIR is blocked, it will not finish jobs and thus >> not send mail. >> >> Does your above statement imply that your DIR is stuck for some days, >> when it happens? That would probably rule out catalog performance >> issues as even an underpowered database server should finish the >> queries after a few days... >> >> > Yes, I find the dir locked yesterday, but the last log, mail, and backup > is at 30-06-2007 in the night. But there were jobs running and completed after Jun 30, but before Jul 5, but they didn't send mail? Also, what did your status monitors (of which you have at least 16...) report? > After that the director is locked, and I have no more info about it. Well, it seems as though you do have lots of information, but didn't actually collect the relevant things to understand what's happening... > For the db I use mysql 5.0, standard rpm installation, and in the log I > have not info about problems. > The biggest table is the File, 832432 records,FileName are 424158, Path > are 31414 and Log 38362. > > If you think I need to enlarge mysql resources, Is not a problem. You'd get error messages (at least in the log files) if the tables were full or the disk space exhausted. > Can the problem be caused because I specified in messages this? > > catalog = all, !skipped, !saved, !terminate > > this write many data to the db, if the problem is the db, I can remove > this rule. But you told us there were no database issues... did you actually check (with mysqladmin processlist) what the database is doing? I really think you should stop looking in all directions at once. Try to narrow down the possible reasons for your problem, instead of looking at everything at once. If you can reproduce the error, create a debug log file (trace) and observe the catalog - that's my advice, at least. Alternatively, when the DIR hangs again, use strace and / or gdb to see what's happening. You gave good reasons to believe that your hardware, OS and database are working correctly, so stop looking there. Arno -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users