Great idea to head back to what was working!

I'd recommend making only one config change at a time and test the  
change (I like doing it all at once and then sifting through the  
debris - but that's just me - I like a challenge.) Don't forget to  
*reload  or stop and restart the daemons.

Consider scheduling as a way to 'load balance' the jobs as well.  
Example:

        client's 1-5 >  job starts at 11:05

        client's 6-11 > job start at 11:55

I think I've seen other's in the list handle large numbers of FDs  
this way - there are quite a few here who are handling large numbers  
of client machines that can also provide a wealth of info on the  
subject - I'd suggest starting a new thread with the heading:

        Backup Strategy for Several Client Machines

and see if that helps get more traction for your challenge ahead.

Let us know how it goes for you.

Erich


On Aug 14, 2006, at 10:08 AM, Victoria wrote:

> Thank You for Your answer!
>
> But i don't think what the problem is with hardware, because 95% of
> computers won't backup with concurrent job and spooling settings and
> they are sun, ibm machines with good cables connected. As i said  
> before
> -  when i haven't configuration with these settings all was fine. But
> now i have more computers and i need such settings, because night  
> is not
> enough for backup process and i though concurrent jobs will help,  
> but i
> got this problem.
> Director and Storage daemons are on different machines, but in the  
> same
> network subnet. All computers are in internal network. Tonight i will
> try to backup without these settings as before i did and check maybe i
> will got a new idea why it is happening.
>
> Victoria
>
> On Mon, 2006-08-14 at 08:54 -0500, Erich Prinz wrote:
>> Hope a few others chime in here besides me.
>>
>> Reset by peer indicates network issues. These issues can sometimes be
>> attributed to faulty hardware (NIC, Patch cable, router/switch.) The
>> heartbeat gives the interaction a greater tolerance for delays due to
>> slow links but won't correct underlying hardware problems.
>>
>> Is the Director and Storage daemon on the same machine?
>>
>> Is the client on your LAN or remote?
>>
>> Erich
>>
>> On Aug 10, 2006, at 3:01 AM, Victoria wrote:
>>
>>> Thank You for Your suggestions, but i still have the same problems.
>>> And heartbeat seems doesn't help to prevent such situation.
>>>
>>> What else i should check?
>>> I just want mentioned these problems started when i added  
>>> spooling and
>>> concurrent jobs.
>>>
>>> Thanks.
>>>
>>> Victoria
>>>
>>> On Tue, 2006-08-08 at 13:29 -0500, Erich Prinz wrote:
>>>> For a start, add the 'heartbeat' directive to keep the connection
>>>> alive on the FD. This is done in the FD .conf file.
>>>>
>>>>  From the Bacula manual:
>>>> Heartbeat Interval = <time-interval>
>>>> This record defines an interval of time. For each heartbeat that  
>>>> the
>>>> File daemon receives from the
>>>> Storage daemon, it will forward it to the Director. In addition,
>>>> if no
>>>> heartbeat has been received from the Storage daemon and thus for-
>>>> warded the File daemon will send a heartbeat signal to the Director
>>>> and to the Storage daemon to keep the channels active. The default
>>>> interval is zero which disables the heartbeat. This feature is
>>>> partic-
>>>> ularly useful if you have a router such as 3Com that does not  
>>>> follow
>>>> Internet standards and times out a valid connection after a  
>>>> short du-
>>>> ration despite the fact that keepalive is set.
>>>>
>>>> Erich
>>>>
>>>>
>>>> On Aug 8, 2006, at 3:48 AM, victoria wrote:
>>>>
>>>>> Hello to everyone!
>>>>>
>>>>> I have following situation on my backup servers. First, my backup
>>>>> structure is like this:
>>>>> Backup server and Storage server with MySQL database.
>>>>> Backup director configuration on backup server looks following:
>>>>> Director {
>>>>>            Name = backup-dir
>>>>>            DIRport = 9101
>>>>>            QueryFile = "/opt/bacula/scripts/query.sql"
>>>>>            WorkingDirectory = "/var/bacula/working"
>>>>>            PidDirectory = "/var/run"
>>>>>            Password = "password"
>>>>>            Maximum Concurrent Jobs = 10
>>>>>            Messages = Standard
>>>>>            FD Connect Timeout = 1
>>>>>         }
>>>>> As you can see, i am using concurrent jobs.
>>>>> And SD daemon on storage server.
>>>>> When i'm trying backup something i got file on storage server:
>>>>> [EMAIL PROTECTED] du -h /baculabk/robber0001
>>>>> 2.0G    /baculabk/robber0001
>>>>> But backup message is:
>>>>> 07-Aug 21:15 backup-dir: No prior Full backup Job record found.
>>>>> 07-Aug 21:15 backup-dir: No prior or suitable Full backup found.
>>>>> Doing
>>>>> FULL backup.
>>>>> 07-Aug 21:15 backup-dir: Start Backup JobId 18,
>>>>> Job=robber.2006-08-07_21.05.16
>>>>> 07-Aug 21:15 backup-dir: Created new Volume "robber0001" in  
>>>>> catalog.
>>>>> 07-Aug 17:53 backup-sd: Labeled new Volume "robber0001" on device
>>>>> "robber" (/baculabk).
>>>>> 07-Aug 17:53 backup-sd: Wrote label to prelabeled Volume
>>>>> "robber0001" on
>>>>> device "robber" (/baculabk)
>>>>> 07-Aug 17:53 backup-sd: Spooling data ...
>>>>> 07-Aug 20:31 backup-sd: Committing spooled data to Volume
>>>>> "robber0001".
>>>>> Despooling 2,104,893,735 bytes ...
>>>>> 07-Aug 20:33 backup-sd: Sending spooled attrs to the Director.
>>>>> Despooling 108,212,728 bytes ...
>>>>> 08-Aug 01:15 backup-dir: robber.2006-08-07_21.05.16 Fatal error:
>>>>> Network
>>>>> error with FD during Backup: ERR=Connection reset by peer
>>>>> 08-Aug 01:15 backup-dir: robber.2006-08-07_21.05.16 Fatal  
>>>>> error: No
>>>>> Job
>>>>> status returned from FD.
>>>>> 08-Aug 01:15 backup-dir: robber.2006-08-07_21.05.16 Error: Bacula
>>>>> 1.38.5
>>>>> (18Jan06): 08-Aug-2006 01:15:58
>>>>>   JobId:                  18
>>>>>   Job:                    robber.2006-08-07_21.05.16
>>>>>   Backup Level:           Full (upgraded from Incremental)
>>>>>   Client:                 "robber-fd" sparc-sun-
>>>>> solaris2.8,solaris,5.8
>>>>>   FileSet:                "robber" 2006-08-07 21:15:40
>>>>>   Pool:                   "robber"
>>>>>   Storage:                "robber"
>>>>>   Scheduled time:         07-Aug-2006 21:05:15
>>>>>   Start time:             07-Aug-2006 21:15:40
>>>>>   End time:               08-Aug-2006 01:15:58
>>>>>   Priority:               10
>>>>>   FD Files Written:       0
>>>>>   SD Files Written:       345,527
>>>>>   FD Bytes Written:       0
>>>>>   SD Bytes Written:       2,095,004,109
>>>>>   Rate:                   0.0 KB/s
>>>>>   Software Compression:   None
>>>>>   Volume name(s):         robber0001
>>>>>   Volume Session Id:      16
>>>>>   Volume Session Time:    1154961001
>>>>>   Last Volume Bytes:      2,104,502,272
>>>>>   Non-fatal FD errors:    0
>>>>>   SD Errors:              0
>>>>>   FD termination status:  Error
>>>>>   SD termination status:  OK
>>>>>   Termination:            *** Backup Error ***
>>>>>
>>>>> I can't restore nothing from this file.
>>>>> Why it sends reset by peer? Please point me what is wrong.
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Victoria
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> --
>>>>> ---
>>>>> Using Tomcat but need to do more? Need to support web services,
>>>>> security?
>>>>> Get stuff done quickly with pre-integrated technology to make your
>>>>> job easier
>>>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>>>> Geronimo
>>>>> http://sel.as-us.falkag.net/sel?
>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>> _______________________________________________
>>>>> Bacula-users mailing list
>>>>> Bacula-users@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> ----
>>>> Using Tomcat but need to do more? Need to support web services,
>>>> security?
>>>> Get stuff done quickly with pre-integrated technology to make your
>>>> job easier
>>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>>> Geronimo
>>>> http://sel.as-us.falkag.net/sel?
>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>> _______________________________________________
>>>> Bacula-users mailing list
>>>> Bacula-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------- 
> ---
> Using Tomcat but need to do more? Need to support web services,  
> security?
> Get stuff done quickly with pre-integrated technology to make your  
> job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache  
> Geronimo
> http://sel.as-us.falkag.net/sel? 
> cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to