I have some news about my problem.
I understand that the disappearing jobs were actually canceled jobs, caused by too slow preceding jobs.
So I investigated the reason for this slow jobs, and merged the problem with another one I still have on Sun280R machines, that usually are much much slower than Sun v20.
The actual reason may be linked with the usage of the two machines: Sun280R normally holds corporate data, while v20 machines are usually our platform for Intranet Server (imap mail, webapp, postgres, bacula, ecc.).
While testing one 280R I discovered that:
- bacula has a slow rate when I have a lot of small files to be backed up
- placing a big 250Mb file inside the backup list, bacula rate increased to 10000Kb/sec while working on this file, then went down again to 900Kb/sec when working on small files.
I also saw that bacula work was different in the two cases:
- while working on small files, the tape was inactive for long times, while a status on bacula revealed it was working on many files (probably buffering?), then the active light was on for some seconds and then pause again.
- while working on the big file, the tape was active for 10-20 seconds, and the file was backed up during this small time, then the activity light was off, and bacula showed it was working on small files.
Is there some parameter I can work on to let bacula increase his performances when working on small files?
What are the things that may affect the performances of Bacula, when no process is stealing CPU a part from Bacula-dir, Bacula-fd, Bacula-sd and Postgres?
Is system memory an issue? I usually have at least 1Gb of RAM on these machines, and I always see free ram available during bacula backups (looking with 'top').
Thanx,
Gabriele.
----------------------------------------------------------------------------------
Da: Arno Lehmann <[EMAIL PROTECTED]>
A: bacula-users <bacula-users@lists.sourceforge.net>
Data: 29 agosto 2006 21.02.55 CEST
Oggetto: Re: [Bacula-users] Help with jobs disappearing!
I understand that the disappearing jobs were actually canceled jobs, caused by too slow preceding jobs.
So I investigated the reason for this slow jobs, and merged the problem with another one I still have on Sun280R machines, that usually are much much slower than Sun v20.
The actual reason may be linked with the usage of the two machines: Sun280R normally holds corporate data, while v20 machines are usually our platform for Intranet Server (imap mail, webapp, postgres, bacula, ecc.).
While testing one 280R I discovered that:
- bacula has a slow rate when I have a lot of small files to be backed up
- placing a big 250Mb file inside the backup list, bacula rate increased to 10000Kb/sec while working on this file, then went down again to 900Kb/sec when working on small files.
I also saw that bacula work was different in the two cases:
- while working on small files, the tape was inactive for long times, while a status on bacula revealed it was working on many files (probably buffering?), then the active light was on for some seconds and then pause again.
- while working on the big file, the tape was active for 10-20 seconds, and the file was backed up during this small time, then the activity light was off, and bacula showed it was working on small files.
Is there some parameter I can work on to let bacula increase his performances when working on small files?
What are the things that may affect the performances of Bacula, when no process is stealing CPU a part from Bacula-dir, Bacula-fd, Bacula-sd and Postgres?
Is system memory an issue? I usually have at least 1Gb of RAM on these machines, and I always see free ram available during bacula backups (looking with 'top').
Thanx,
Gabriele.
|
Gabriele Bulfon - Sonicle S.r.l. Tel +39 028246016 Int. 30 - Fax +39 028243880 Via Felice Cavallotti 16 - 20089, Rozzano - Milano - ITALY http://www.sonicle.com |
----------------------------------------------------------------------------------
Da: Arno Lehmann <[EMAIL PROTECTED]>
A: bacula-users <bacula-users@lists.sourceforge.net>
Data: 29 agosto 2006 21.02.55 CEST
Oggetto: Re: [Bacula-users] Help with jobs disappearing!
Hello,
On 8/29/2006 7:20 PM, Kern Sibbald wrote:
> On Tuesday 29 August 2006 19:01, Martin Simmons wrote:
>
>>>>>>>On Tue, 29 Aug 2006 17:24:02 +0200 (CEST), Gabriele Bulfon said:
>>>
>>>Yes, I use MaxStartDelay to automatically cancel jobs that are no more in
>
> time, so that I'm sure that Bacula is ready for the new job of the next
> night.
>
>>>Probably you're right about it: MaxStartDelay is causing cancel of jobs.
>
> The reason is that now I discovered that jobs are VERY slow during the night
> (1000Kb/s), compared to how they run if I schedule them during the day
> (8000Kb/s). This is causing a too long backup and causing the cancel of jobs.
>
>>>What is strange is: why is not Bacula sending me reports about this
>
> canceled jobs?
>
>>>When, for example, I put a wrong tape, I get a lot of "Intervention
>
> needed", followed by a final list of "Canceled" reports, when the
> MaxStartDelay has been reached.
>
>>>Why not in this case?
>>
>>Yes, I would say that is a bug if no notification is generated (as long as
>
> the
>
>>Messages resource allows it).
I haven't investigated this, but I *think* Gabriele and Martin might be
correct with assuming a bug. Reason?
I've seen the same, but this will need more investigation, I think.
Should be easily reproduced.
>
> I suspect that the messages are generated at the daemon level as there is
> probably no job associated with them. If the daemon messages are not
> properly configured as was the case in some of the older default releases,
> they will simply be bit bucketed.
>
> Basically to ensure that you "see" daemon messages, you need to ensure that
> each daemon (particularly the Director) has a "Messages = xxx" in the daemon
> resource (Director { } for the director, ...).
Hmm, unless I'm really wrong I have these resources everywhere...
Arno
> Regards,
>
> Kern
>
>
>>__Martin
>>
>>
>>
> ----------------------------------------------------------------------------------
>
>>>Da: Martin Simmons <[EMAIL PROTECTED]>
>>>A: [EMAIL PROTECTED]
>>>Cc: bacula-users@lists.sourceforge.net
>>>Data: 25 agosto 2006 22.15.13 CEST
>>>Oggetto: Re: [Bacula-users] Help with jobs disappearing!
>>>
>>>>>>>On Fri, 25 Aug 2006 10:32:37 +0200, Gabriele Bulfon said:
>>>>
>>>>Hello,
>>>>I am sending again this S.O.S. because I had no response about it.
>>>>
>>>>I have a Bacula setup to run 5 jobs each night.
>>>>They're scheduled at 23:00, 23:05, 23:10, 23:15, 23:20, so that bacula
>>>>will queue each job to run after the previous one is done.
>>>>The full queue is normally finished around 6:00am.
>>>>I also have setup a maximum delay, so that jobs are canceled around
>>>>12:00 next morning, if for some reasons they're waiting (missing tape or
>>>>other problems), and everything can restart normally next night.
>>>>I started this one year ago, with a clean postgres db, and new labeled
>>>>tapes into a library.
>>>>I had many months of correct backups in the following job list:
>>>>"solaris10" - "wserver" - "iserver" - "adhoc" - "catalog"
>>>>
>>>>During July 2006 I noticed that the "catalog" job was missing from my
>>>>daily report.
>>>>I looked into the bacula log file, and I had absolutely no trace about
>>>>the catalog job.
>>>>Inside the log, I could see only the "OK" sequence of "solaris10" -
>>>>"wserver" - "iserver" - "adhoc".
>>>>
>>>>At the end of July I noticed that also the "adhoc" job was missing....
>>>>Inside the log, I could see only the "OK" sequence of "solaris10" -
>>>>"wserver" - "iserver".
>>>>
>>>>In the middle of August I noticed that even the "iserver" job was
>>>>missing!.....
>>>>Inside the log, I could see only the "OK" sequence of "solaris10" -
>>>>"wserver".
>>>>
>>>>...jobs are disappearing one by one....with no trace.
>>>>I noticed that inside the "bacula/working" folder, I had the "bsr" files
>>>>with dates corresponding to the last time they were executed:
>>>>Jul 16 BackupCatalog.bsr (1st disappeared job)
>>>>Jul 27 adhoc.bsr (2nd disappeared job)
>>>>Aug 18 iserver.bsr (3rd disappeared job)
>>>>Aug 25 wserver.bsr (this is still running)
>>>>Aug 25 solaris10.bsr (this is still running)
>>>>
>>>>Using bconsole and "show jobs", I can see all the jobs correctly setup.
>>>>If I log into the system at midnight, and use bconsole "cancel" to see
>>>>the running/waiting jobs (obviously aborting the cancel operation), I
>>>>can see all the jobs are there waiting to be processed.
>>>>When I log into the system next morning, I can see no trace of the
>>>>missing jobs....
>>>>
>>>>I experienced this problem one year ago on the same setup, so I
>>>>scratched the postgres DB, restarted from a clean db relabeling each
>>>>tape, and everything worked fine again until July 2006...
>>>>
>>>>How can I investigate this problem?! Please HELP!!!!
>>>
>>>I would start by checking the units of your MaxStartDelay value, just in
>
> case
>
>>>it is causing cancellation.
>>>Did the time taken by the wserver job change much on the day the iserver
>
> was
>
>>>first missing (presumably Aug 19) compared to Aug 18?
>>>__Martin
>>
>>-------------------------------------------------------------------------
>>Using Tomcat but need to do more? Need to support web services, security?
>>Get stuff done quickly with pre-integrated technology to make your job
>
> easier
>
>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>_______________________________________________
>>Bacula-users mailing list
>>Bacula-users@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
--
IT-Service Lehmann [EMAIL PROTECTED]
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users