Hi Ana -

Thanks for your reply.

What we discovered was the implementation of rp_filter in Ubuntu 12.04 prevented packets from flowing like we expected. While this behavior is good in general, it stopped traffic from going through where we wanted it. Once that was disabled, things started working.

More info if anyone is interested: https://wiki.ubuntu.com/ImprovedNetworking/KernelSecuritySettings

Next step is to review our Bacula and network setups to see if that's the way we want to continue to go.

Thanks again for the suggestions.

Kevin B. Zimmerman

On 09/22/2013 02:56 PM, Ana Emília M. Arruda wrote:
Hi Kevin,

Yes, that´s the way bacula works. It "ties" volumes to devices. But you´re having this behaviour because of your configuration. Maybe you should take a look at vchanger. There is a very good discussion about vchanger and bacula disk-changer: http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg55896.html

Best regards,
Ana


On Sat, Sep 21, 2013 at 7:13 PM, Kevin B. Zimmerman <kevin.zimmer...@kitware.com <mailto:kevin.zimmer...@kitware.com>> wrote:

    Hi Ana -

    Thanks for responding.

    I currently have no issues communicating between the public-fd
    client and the storage daemon at 192.168.120.35.  I'm able to
    connect between both addresses on ports 9102 and 9103.

    As far as the JobID, I pass the specific job to the console during
    the restore.  What I'm confused about is why a disk volume ties
    itself to one specific storage resource when there are numerous
    clients from different VLANs.  Ultimately, I don't really care
    where it looks to get it's data from, but it was something that
    didn't make sense to me.

    Did I answer your questions?  Any other thoughts?

    Thanks,

    Kevin B. Zimmerman

    On 09/20/2013 01:31 PM, Ana Emília M. Arruda wrote:
    Hi Kevin,

    Just trying to understand... The error bellow:

    Restore-public-Disk.2013-09-20_12.43.48_32 is waiting
    for Client to connect to Storage daemon
    ...
    20-Sep 12:19 public-fd JobId 19084: Fatal error: Authorization
    key rejected by Storage daemon.
    Please
    seehttp://www.bacula.org/rel-manual/faq.html#AuthorizationErrors
    <http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors>  for
    help.
    20-Sep 12:19 public-fd JobId 19084: Fatal error: Failed to
    authenticate Storage daemon.

    It means that your client "public-fd" have problems to connect to
    storage daemon. In the case, for the restore job, bacula is
    asking for:

    Storage {
     Name = File_20
     Address = 192.168.120.35
    ...
    }

    So, probably, public-fd has network problems in communicating
    with 192.168.120.35...

    Are you using the correct JobId for the restore? When you say
    that it should ask for File_5 storage, maybe you are
    misunderstanding the JobId information for the restore you want.

    Best regards,
    Ana





    On Fri, Sep 20, 2013 at 2:09 PM, Kevin B. Zimmerman
    <kevin.zimmer...@kitware.com
    <mailto:kevin.zimmer...@kitware.com>> wrote:

        Greetings -

        I'm running Bacula 5.2.5 on an Ubuntu 12.04.3 server, with
        about 61TB of
        disk storage and an attached Dell PVTL2000 tape library.
        Backups are
        working great, but restores are presenting a problem.

        Here's my dilemma:

        When attempting to restore a file, I get this status for
        about 10 minutes:

          JobId Level   Name Status
        ======================================================================
          19085 Restore-public-Disk.2013-09-20_12.43.48_32 is waiting
        for Client to connect to Storage daemon

        and then this error:

        20-Sep 12:09 khq-backups2-dir JobId 19084: Start Restore Job
        Restore-public-Disk.2013-09-20_12.09.20_02
        20-Sep 12:09 khq-backups2-dir JobId 19084: Using Device
        "DiskStorage"
        20-Sep 12:19 public-fd JobId 19084: Fatal error:
        Authorization key rejected by Storage daemon.
        Please
        seehttp://www.bacula.org/rel-manual/faq.html#AuthorizationErrors
        <http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors>
         for help.
        20-Sep 12:19 public-fd JobId 19084: Fatal error: Failed to
        authenticate Storage daemon.
        20-Sep 12:19 khq-backups2-dir JobId 19084: Fatal error:
        Socket error on Storage command: ERR=No data available
        20-Sep 12:19 khq-backups2-dir JobId 19084: Error: Bacula
        khq-backups2-dir 5.2.5 (26Jan12):
           Build OS:               x86_64-pc-linux-gnu ubuntu 12.04
           JobId:                  19084
           Job:  Restore-public-Disk.2013-09-20_12.09.20_02
           Restore Client:         public-fd
           Start time:             20-Sep-2013 12:09:22
           End time:               20-Sep-2013 12:19:33
           Files Expected:         1
           Files Restored:         0
           Bytes Restored:         0
           Rate:                   0.0 KB/s
           FD Errors:              1
           FD termination status:
           SD termination status:  Waiting on FD
           Termination:            *** Restore Error ***

        20-Sep 12:19 khq-backups2-dir JobId 19084: Error: Bacula
        khq-backups2-dir 5.2.5 (26Jan12):
           Build OS:               x86_64-pc-linux-gnu ubuntu 12.04
           JobId:                  19084
           Job:  Restore-public-Disk.2013-09-20_12.09.20_02
           Restore Client:         public-fd
           Start time:             20-Sep-2013 12:09:22
           End time:               20-Sep-2013 12:19:33
           Files Expected:         1
           Files Restored:         0
           Bytes Restored:         0
           Rate:                   0.0 KB/s
           FD Errors:              2
           FD termination status:
           SD termination status:  Waiting on FD
           Termination:            *** Restore Error ***

        So I went to check out
        http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors as
        suggested by the error, but it doesn't exist.  I looked at
        
http://www.bacula.org/5.2.x-manuals/en/problems/problems/Bacula_Frequently_Asked_Que.html#SECTION00260000000000000000
        presuming it to be the same, but it didn't help.

        I'm not sure why it fails on auth.  It uses the same password
        to run
        backups, and those work fine.
        I tested connectivity between the two systems, and both are
        able to talk
        to each other (port 9102 on client, ports 9102 & 9103 on each
        of the
        server's many IPs).
        Max Concurrent Jobs are set to 7 in all configs, and no other
        jobs are
        running.

        One thing I noticed was that after selecting the file(s) to
        restore, I
        get this:

        The job will require the following
            Volume(s)                 Storage(s)          SD Device(s)
        
===========================================================================

             DiskPool0197              File_20           DiskStorage

        This particular client uses File_5, which may be part of the
        problem,
        though I don't understand why.  That volume (DiskPool0197)
        has backups
        from many different clients, from many different VLANs.

        Some relevant info:

        In trying to reduce network overhead, the backup server has
        been added
        to each of the VLANs where the Bacula clients reside,
        resulting in
        numerous file and tape storage definitions.  Here's a subset
        of the file
        definitions:

        Storage {
           Name = File_5
           Address = 192.168.115.35
           SDPort = 9103
           Password = "<redacted>"
           Device = DiskStorage
           Media Type = File
           Maximum Concurrent Jobs = 7
        }

        Storage {
           Name = File_20
           Address = 192.168.120.35      # N.B. Use a fully qualified
        name here
           SDPort = 9103
           Password = "<redacted>"
           Device = DiskStorage
           Media Type = File
           Maximum Concurrent Jobs = 7
        }

        The problem has not been limited to a specific VLAN.
        I've tried changing all of the Storage definitions to use the
        same IP
        wondering if Bacula was confused about which IP to send
        traffic out, to
        no avail.
        I've tried turning on debugging, but got no output.

        No idea what I'm missing here. What other info did I fail to
        provide?
        Any thoughts on what's going on?

        Thanks,

        --
        Kevin B. Zimmerman



        
------------------------------------------------------------------------------
        LIMITED TIME SALE - Full Year of Microsoft Training For Just
        $49.99!
        1,500+ hours of tutorials including VisualStudio 2012,
        Windows 8, SharePoint
        2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library
        Power Pack includes
        Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends
        9/20/13.
        
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
        _______________________________________________
        Bacula-users mailing list
        Bacula-users@lists.sourceforge.net
        <mailto:Bacula-users@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/bacula-users






------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to