Hi, Arno Lehmann wrote: > Hi, > > 05.07.2007 12:07,, Alfredo Marchini wrote:: > >> Hi Arno, >> I don't know if you know bacula source code, >> > > A bit, but I usually look for problems in the configuration as, in my > experience, the source code is quite stable. Of course, there are > bugs, but these should be reproduceable in other installations, too. > Unless I find a setup that looks unique to me, I'm assuming the source > is ok and the problem lies in the configuration or general system. > > >> so I post you some >> parameters and information in my configuration that I think can cause >> this problem or I think is not well configured because I don't well >> understand the manual: >> >> Arno Lehmann wrote: >> >>> Hi, >>> >>> 04.07.2007 17:40,, Alfredo Marchini wrote:: >>> >>> >>>> Hi, >>>> The system and db logs doesn't tell me anything about this problem, like >>>> all the director processes or thread are locked concurrently. >>>> If I restart only bacula-dir without restarting bacula-sd and 16 >>>> bacula-fd the system restart working fine. >>>> >>>> >>> It might be possible that the DIR is busy working on the catalog (like >>> pruning data) and just needs more time. You can check this using >>> 'mysqladmin processlist', for example. >>> >>> >> ok, when rehappen I'll make also this test, but if bacula makes jobs >> and files pruning when volumes are all used, and there are no more >> appendable volumes, I don't have this problem because I've got used only >> 10 volumes of 50Gb and have other 8 volumes avalaible and not already >> created. >> > > Are you saying that there are always volumes available and thus no > pruning happens? > > When the error occured I had 10 volumes used and 8 volumes avalaible, but after 3 weeks 18 volumes are all used and then bacula makes recycling of the oldest that have inside the jobs older than 14 days (max file, job and volume retention period).
Here's my sd config: Storage { Name = mystorage SDPort = 9103 SDAddress = binding ip WorkingDirectory = "/var/bacula/storage-wk" PidDirectory = "/var/run" Maximum Concurrent Jobs = 60 Heartbeat Interval = 10 Client Connect Wait = 60 } Here's my sd device config Device { Name = mydevice Media Type = File Archive Device = "/mnt/storage/volumes" LabelMedia = yes Random Access = yes AutomaticMount = yes RemovableMedia = no AlwaysOpen = yes } Here's my dir config: Director { Name = mydirector DIRAddress = binding ip DIRport = 9101 QueryFile = "/etc/bacula/query.sql" WorkingDirectory = "/var/bacula/director-wk" PidDirectory = "/var/run" Maximum Concurrent Jobs = 30 Password = "password" Messages = "mymessages-daemon" } Here's my dir sd config: Storage { Name = mystorage Address = ip SDPort = 9103 Password = "password" Device = mydevice Media Type = File Maximum Concurrent Jobs = 60 } Here's my dir pool config: Pool { Name = mypool Pool Type = Backup Storage = mystorage Recycle = yes AutoPrune = yes Maximum Volumes = 18 Maximum Volume Bytes = 50000000000 Volume Retention = 14 days Label Format = "Volume-" } > That would indeed rule out the catalog as a bottle neck. > > >>> >>> >>>> Now I have already restarted bacula-dir and all works fine (I backup 16 >>>> servers, I cannot take it in offline mode or someone kill me this >>>> evening), so I'm not able to reproduce the error until about 10-15 days. >>>> Last time that I'd got this problem I used top and I didn't find >>>> anything strange. >>>> >>>> >>> Ok, so let's assume the hardware, OS and relevant applications are >>> running ok. >>> >>> >>> >> Yes, I think is the right way. >> >>>> But the test with time command will be the first when It will rehappen. >>>> I don't think that the problem is with database, when I connect to >>>> database with mysql command line to db bacula it works fine and quickly. >>>> >>>> >>> Bacula uses its own, internal locking, so you won't necessarily notice >>> anything from outside of Bacula. >>> >>> >>> >> Ok >> > ... > >>>> Another thing is the maximum concurrent jobs : >>>> On director = 30 >>>> On storage side director configuration file = 60 >>>> On storage = 60 >>>> >>>> >>> Quite a lot, I think. Running up to 30 jobs in parallel might load >>> your backup server beyond its reasonable working maximum, but that >>> depends on your hardware, software, and requirements. >>> >>> >>> >> I've set this value because: >> director = 30 because i've 16 fd that can connects concurrently (it is >> not the truth) plus >> one job for fd to ask the status (16x2 = 32 rounded to 30). >> > > I don't understand why you reserve job slots for the FDs... the FDs > don't connect to the DIR to as for a status as far as I know. Or do > you refer to some sort of tray monitor? > Sorry, yes, I have got configured also one monitor for all fd. > >> storage = 60 because when 16 fd connects concurrently to the storage >> i've go also 16 connections from the director to the storage (when jobs >> starts). >> > > The limit for the SD refers to running jobs, not to connections as far > as I know. > > For example, I run four jobs concurrently, and even if these jobs are > all running, the DIR can connect for status display and the monitoring > application can ask for the SD status, too. > Ah, ok, so If I have 16 concurrent jobs to the sd I can set maximum concurrent jobs to sd to 16 (also on director side). Is correct? > >> I thought that the not responding problem was caused by this params, so >> I setted high values because I don't know how (at devel level) bacula >> works with tcp connections (I thought that the problem was caused by >> missing sufficient concurrent threads). >> > > I don't think so... the limits you set do not control how many threads > can be created, or how many network connections can exist > simultaneously. At least my impression is different. > > >> Another thing: >> I've setted for all fd the messages that points to the director messages. >> Example: >> on director named = bacula-dir I've created messages named = >> bacula-dir-messages >> on all fd I've setted message named = bacula-dir-message that points to >> director bacula-dir >> > > I don't think this is relevant here, unless you have reason to believe > that messages are not sent to the DIR. > > No, messages are correctly sent to director. >> Last thing and I've got no more: >> >> If I go to working directory of bacula-dir, when is not responding, I >> find the files of mail that have to be send via e-mail to the operators >> old 2-3 days, as the bacula-dir is blocked and cannot send the e-mail >> (when is working fine the mail are correctly sent to all the operators). >> > > Obviously, when the DIR is blocked, it will not finish jobs and thus > not send mail. > > Does your above statement imply that your DIR is stuck for some days, > when it happens? That would probably rule out catalog performance > issues as even an underpowered database server should finish the > queries after a few days... > > Yes, I find the dir locked yesterday, but the last log, mail, and backup is at 30-06-2007 in the night. After that the director is locked, and I have no more info about it. For the db I use mysql 5.0, standard rpm installation, and in the log I have not info about problems. The biggest table is the File, 832432 records,FileName are 424158, Path are 31414 and Log 38362. If you think I need to enlarge mysql resources, Is not a problem. Can the problem be caused because I specified in messages this? catalog = all, !skipped, !saved, !terminate this write many data to the db, if the problem is the db, I can remove this rule. >> I use a postfix smtp server configured for local and bsmtp to send email >> to a smtp server >> installed in my LAN on another linux server. >> > > That's not important here, too. > > Arno > > Alfredo ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users