Re: Looking for suggestions to deal with large backups not completing in 24-hours

Zoltan Forray Mon, 16 Jul 2018 05:30:01 -0700

Robert,

Thanks for the extensive details.  You backup 5-nodes with as more data
then we do for 90-nodes.  So, my question is - what kind of connections do
you have to your NAS/storage device to process that much data in such a
short period of time?


I am not sure what benefit a proxy-node would do for us, other than to
manage multiple nodes from one connection/GUI - or am I totally off base on
this?

Our current configuration is such:

7-Windows 2016 VM's (adding more to spread out the load)
Each of these 7-VM's handle the backups for 5-30 nodes.  Each node is a
mountpoint for an user/department ISILON DFS mount -
i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\* etc.  FWIW, the reason we are
using VM's is the connection is actually faster then when we were using
physical servers since they only had gigabit nics.

Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM
with only 4-other nodes, it still took 4-days to scan and backup 102GB of
32TB.  Below are a recent end-of-session statistics (current backup started
Friday and is still running)

07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects inspected:   20,276,912  (SESSION: 21423)
07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects backed up:       26,787  (SESSION: 21423)
07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects updated:             31  (SESSION: 21423)
07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects rebound:              0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects deleted:              0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects expired:         20,630  (SESSION: 21423)
07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects failed:              36  (SESSION: 21423)
07/09/2018 02:00:06 ANE4197I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects encrypted:            0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4965I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of subfile objects:              0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4914I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects grew:                 0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4916I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of retries:                    124  (SESSION: 21423)
07/09/2018 02:00:06 ANE4977I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of bytes inspected:          31.75 TB  (SESSION: 21423)
07/09/2018 02:00:06 ANE4961I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of bytes transferred:       101.90 GB  (SESSION: 21423)
07/09/2018 02:00:06 ANE4963I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Data transfer time:                      115.78 sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4966I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Network data transfer rate:          922,800.00 KB/sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4967I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Aggregate data transfer rate:            271.46 KB/sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4968I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Objects compressed by:                       30%   (SESSION: 21423)
07/09/2018 02:00:06 ANE4976I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total data reduction ratio:               99.69%   (SESSION: 21423)
07/09/2018 02:00:06 ANE4969I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Subfile objects reduced by:                   0%   (SESSION: 21423)
07/09/2018 02:00:06 ANE4964I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Elapsed processing time:              109:19:48  (SESSION: 21423)


Even when we m

On Sun, Jul 15, 2018 at 7:30 PM Robert Talda <r...@cornell.edu> wrote:

> Zoltan:
>  Finally get a chance to answer you.  I :think: I understand what you are
> getting at…
>
>  First, some numbers - recalling that each of these nodes is one storage
> device:
> Node1: 358,000,000+ files totalling 430 TB of primary occupied space
> Node2: 302,000,000+ files totaling 82 TB of primary occupied space
> Node3: 79,000,000+ files totaling 75 TB of primary occupied space
> Node4: 1,000,000+ files totalling 75 TB of primary occupied space
> Node5: 17,000,000+ files totalling 42 TB of  primary occupied space
>   There are more, but I think this answers your initial question.
>
>  Restore requests are handled by the local system admin or, for lack of a
> better description, data admin.  (Basically, the research area has a person
> dedicated to all the various data issues related to research grants, from
> including proper verbiage in grant requests to making sure the necessary
> protections are in place).
>
>   We try to make it as simple as we can, because we do concentrate all the
> data in one node per storage device (usually a NAS).  So restores are
> usually done directly from the node - while all backups are done through
> proxies.  Generally, the restores are done without permissions so that the
> appropriate permissions can be applied to the restored data.  (Oft times,
> the data is restored so a different user or set of users can work with it,
> so the original permissions aren’t useful)
>
>   There are some exceptions - of course, as we work at universities, there
> are always exceptions - and these we handle as best we can by providing
> proxy nodes with restricted priviledges.
>
>   Let me know if I can provide more,
> Bob
>
>
> Robert Talda
> EZ-Backup Systems Engineer
> Cornell University
> +1 607-255-8280
> r...@cornell.edu
>
>
> > On Jul 11, 2018, at 3:59 PM, Zoltan Forray <zfor...@vcu.edu> wrote:
> >
> > Robert,
> >
> > Thanks for the insight/suggestions.  Your scenario is similar to ours but
> > on a larger scale when it comes to the amount of data/files to process,
> > thus the issue (assuming such since you didn't list numbers).  Currently
> we
> > have 91 ISILON nodes totaling 140M objects and 230TB of data. The largest
> > (our troublemaker) has over 21M objects and 26TB of data (this is the one
> > that takes 4-5 days).  dsminstr.log from a recently finished run shows it
> > only backed up 15K objects.
> >
> > We agree that this and other similarly larger nodes need to be broken up
> > into smaller/less objects to backup per node.  But the owner of this
> large
> > one is balking since previously this was backed up via a solitary Windows
> > server using Journaling so everything finished in a day.
> >
> > We have never dealt with proxy nodes but might need to head in that
> > direction since our current method of allowing users to perform their own
> > restores relies on the now deprecated Web Client.  Our current method is
> > numerous Windows VM servers with 20-30 nodes defined to each.
> >
> > How do you handle restore requests?
> >
> > On Wed, Jul 11, 2018 at 2:56 PM Robert Talda <r...@cornell.edu> wrote:
> >
>
>

-- 
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Reply via email to