ISP is not always the best when it comes to parallelization, and it needs a 
helping hand
- Put the DB on the best flash you can afford. It has increasingly become our 
bottleneck particularly when backing up LOTS of small files
- Break up the backup jobs into LOTS of sessions using proxy nodes,  number of 
dsmc clients and resource utilisation
- if you are writing the backups to afile storage pool on the isilon use 
multiple NFS mounts and multiple isilon nodes.
- increase the maximum number of mount points for ISP and the storage pool so 
that it is larger than the total number of sessions ie #proxy_nodes * 
#dsmc_proceses * resource_utilisation
- decrease the size of the file pool volumes so that  there can be at LEAST the 
same number as the number of mount points.
- check your client networking options and kernel settings

That's all I can think of at the moment

HTH

Grant
________________________________________
From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> on behalf of Zoltan Forray 
<zfor...@vcu.edu>
Sent: Friday, 7 September 2018 4:22 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] nfstimeout on server ISILON storage

>>> Are the timeouts repeatable enough that you can get a packet capture
in there before and while they're happening?

They happen often/sometimes all-the-time if there is any kind of
storagepool activity.  Looking through /var/log/messages - it happened
almost every 5-minutes starting before from before 8pm yesterday and
stopped around 3am.  Looking through the ISP server logs I see reclaims
ending around the time the messages stopped.  Before that there were
Identify/dedupe processes, a DB backup (upstream to one of the ISP servers
at my physical location.  The Earth server is offsite used solely for DB
backups and replication target).

As my SAN person said, maybe we are expecting too much from the ISILON/NFS.
Unfortunately, it was/is the cheapest solution since I need the 500TB
(almost always at 90% used even with dedup).

We have been working with networking since we are also addressing the issue
of seeing lots of completely unrelated TCP traffic/broadcasts on the same
VLAN as the NFS storage.  However, a few days ago they moved it to a new
VLAN and the extraneous
noise" has stopped.

On Wed, Sep 5, 2018 at 7:29 PM Skylar Thompson <skyl...@uw.edu> wrote:

> Yep, you're right, I misread that (shouldn't send email pre-coffee).
>
> Are the timeouts repeatable enough that you can get a packet capture in
> there before and while they're happening?
>
> On Wed, Sep 05, 2018 at 07:09:09PM -0400, Zoltan Forray wrote:
> > Skylar,
> >
> > I sent your comment about UDP vs TDP to my OS tech (beyond my ken) - got
> > this feedback:
> >
> > I assume what they are talking about is this:
> >
> > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,
> > *mountproto=udp*,local_lock=none,addr=192.168.19.12)
> >
> > Looks like this is the default setting (also on all the other servers to
> > initiate a conversation with the NFS server). However, if you read the
> > documentation on this option it goes into detail about how this option
> > differs from proto (which is also defined):
> >
> > https://clicktime.symantec.com/a/1/_WRrK8Ud1QlbS4lMAmGly9__1m2hrzx-E5Do8uVTOJQ=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D&u=https%3A%2F%2Faccess.redhat.com%2Fsolutions%2F183583
> >
> > "mountproto differs from proto as it defines what protocol (TCP or UDP)
> the
> > client will use to initiate the connection and conduct the mount and
> > umountoperations.
> > This differs from the proto option which sets the protocol that the
> initial
> > connection *and* the actual transportation will use."
> >
> > The proto option (set to TCP in the mount) appears to be determining how
> > the actual connection and transport of data is conducted.
> >
> > When running a tcpdump on Earth I see NFS TCP traffic running over the 23
> > VLAN (and the 22 VLAN on other TSM servers) and no UDP packets to speak
> of.
> >
> > On Wed, Sep 5, 2018 at 10:25 AM Skylar Thompson <skyl...@uw.edu> wrote:
> >
> > > It looks like you're using UDP as a transport - have you tried
> switching to
> > > TCP? Especially with large NFS payload sizes, you're going to get lots
> of
> > > fragmentation with UDP's 512-byte packet limit.
> > >
> > > On Wed, Sep 05, 2018 at 09:03:25AM -0400, Zoltan Forray wrote:
> > > > A pair of 10G links bonded - CISCO switches.
> > > >
> > > > On Tue, Sep 4, 2018 at 7:54 PM Skylar Thompson <skyl...@uw.edu>
> wrote:
> > > >
> > > > > Quick question - what's the data link protocol (Ethernet, IB,
> etc.) and
> > > > > link rate
> > > > > that you're using?
> > > > >
> > > > > On Tue, Sep 04, 2018 at 02:05:33PM -0400, Zoltan Forray wrote:
> > > > > > We are still fighting issues with ISILON storage. Our current
> issue
> > > is
> > > > > with
> > > > > > NFS timeouts for the storage a server is using.  We see message
> like
> > > > > these
> > > > > > in the server /var/log
> > > > > >
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:21:49 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:22:14 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:22:15 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > > not responding, still trying
> > > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > OK
> > > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > OK
> > > > > > Sep  4 13:22:16 earth kernel: nfs: server
> > > hhisilonnfs23.rams.adp.vcu.edu
> > > > > OK
> > > > > >
> > > > > > OS folks say the NFS mount is setup as IBM recommends in various
> > > > > documents.
> > > > > > So they asked us to implement the nfstimeout option from this
> > > document (
> > > > > >
> > > > >
> > >
> https://clicktime.symantec.com/a/1/EJM9TVZ57AJnIm5O2wkTNp2hGQs5vrMNbZ0rWwqgoFw=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D&u=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSGSG7_7.1.0%2Fcom.ibm.itsm.client.doc%2Fr_opt_nfstimeout.html
> > > > > ).
> > > > > > Yes I realize it is primarily for a client backup of an NFS
> mount,
> > > but
> > > > > the
> > > > > > statement:
> > > > > >
> > > > > > Supported Clients This option is for all UNIX and Linux clients.
> *The
> > > > > > server can also define this option*.
> > > > > >
> > > > > > throws us - kind-of implying I can use this from the server
> > > perspective?
> > > > > > But I can't find any documentation to support using it from the
> > > server.
> > > > > >
> > > > > > For you Linux guru's - this is what the mount says:
> > > > > >
> > > > > > hhisilonnfs23.rams.adp.vcu.edu:/ifs/NFS/TSM on /tsmnfs type nfs
> > > > > >
> > > > >
> > >
> (rw,relatime,sync,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.19.12,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.19.12)
> > > > > >
> > > > > > Any thoughts?  Suggestion?   Are we simply expecting too much
> from
> > > NFS?
> > > > > >
> > > > > > My OS person also asks why ISP is so slow to write to NFS?  When
> they
> > > > > did a
> > > > > > test copy of a large file to the NFS mount, they were getting
> > > upwards of
> > > > > 8G/s
> > > > > > vs 1.5-3G/s when TSM/ISP writes to it (via EMC monitoring tools).
> > > > > >
> > > > > > --
> > > > > > *Zoltan Forray*
> > > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > > > Xymon Monitor Administrator
> > > > > > VMware Administrator
> > > > > > Virginia Commonwealth University
> > > > > > UCC/Office of Technology Services
> > > > > > www.ucc.vcu.edu
> > > > > > zfor...@vcu.edu - 804-828-4807
> > > > > > Don't be a phishing victim - VCU and other reputable
> organizations
> > > will
> > > > > > never use email to request that you reply with your password,
> social
> > > > > > security number or confidential personal information. For more
> > > details
> > > > > > visit 
> > > > > > https://clicktime.symantec.com/a/1/D4Vc0iL0Ihz01IxaPMD4FQKsz4HFdO34N56Mk9lThTY=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D&u=http%3A%2F%2Fphishing.vcu.edu%2F
> > > > >
> > > > > --
> > > > > -- Skylar Thompson (skyl...@u.washington.edu)
> > > > > -- Genome Sciences Department, System Administrator
> > > > > -- Foege Building S046, (206)-685-7354
> > > > > -- University of Washington School of Medicine
> > > > >
> > > >
> > > >
> > > > --
> > > > *Zoltan Forray*
> > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > Xymon Monitor Administrator
> > > > VMware Administrator
> > > > Virginia Commonwealth University
> > > > UCC/Office of Technology Services
> > > > www.ucc.vcu.edu
> > > > zfor...@vcu.edu - 804-828-4807
> > > > Don't be a phishing victim - VCU and other reputable organizations
> will
> > > > never use email to request that you reply with your password, social
> > > > security number or confidential personal information. For more
> details
> > > > visit 
> > > > https://clicktime.symantec.com/a/1/D4Vc0iL0Ihz01IxaPMD4FQKsz4HFdO34N56Mk9lThTY=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D&u=http%3A%2F%2Fphishing.vcu.edu%2F
> > >
> > > --
> > > -- Skylar Thompson (skyl...@u.washington.edu)
> > > -- Genome Sciences Department, System Administrator
> > > -- Foege Building S046, (206)-685-7354
> > > -- University of Washington School of Medicine
> > >
> >
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zfor...@vcu.edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit 
> > https://clicktime.symantec.com/a/1/D4Vc0iL0Ihz01IxaPMD4FQKsz4HFdO34N56Mk9lThTY=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D&u=http%3A%2F%2Fphishing.vcu.edu%2F
>
> --
> -- Skylar Thompson (skyl...@u.washington.edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
>


--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit 
https://clicktime.symantec.com/a/1/D4Vc0iL0Ihz01IxaPMD4FQKsz4HFdO34N56Mk9lThTY=?d=fdefGtcvWswFArtTNHn1OQ8hDy5bnpvV0uiN5e7uU9pHs5i0CVtTvVmDZXauou9rZM8HXg5NINdRQaubM-rXROr9zA8l23Hrm_tP3i7TRxca_NRoOWuC6vZpa0bV9kTSQ-961vT_pNz2In1a-CNUiP0YGaB3S1M0IQA2uIEaa2r92USf7VUnEt7mY-AH6BPp_AYHOx27RpQQwAlK_e-c_7MOVBJebYcTzeD3N0yF-fipCNsDyaUnuLRpI9NuRBcSvujU15Fjd8D2ePhNscjlIgk0yN5QkKUfrC8TJa2hKerFmvID4hYaIsSaRvL12s4muLnHrW8DaqUSdMyLaER66NRx_Whe5h160936eBuUi3MdTBbbR1uAthfTvdFu4HeJDEsjMPrwoYq1XjKV3KSwru1HnJFu_ZxaN3V9LdwqRBfxag%3D%3D&u=http%3A%2F%2Fphishing.vcu.edu%2F
--
Grant Street
Senior Systems Engineer

T: +61 2 9383 4800 (main)
D: +61 2 8310 3582 (direct)
E: grant.str...@al.com.au

Building 54 / FSA #19, Fox Studios Australia, 38 Driver Avenue
Moore Park, NSW 2021
AUSTRALIA

  [LinkedIn] <https://www.linkedin.com/company/animal-logic>   [Facebook] 
<https://www.facebook.com/Animal-Logic-129284263808191/>   [Twitter] 
<https://twitter.com/AnimalLogic>   [Instagram] 
<https://www.instagram.com/animallogicstudios/>

[Animal Logic]<http://www.animallogic.com>

www.animallogic.com<http://www.animallogic.com>

CONFIDENTIALITY AND PRIVILEGE NOTICE
This email is intended only to be read or used by the addressee. It is 
confidential and may contain privileged information. If you are not the 
intended recipient, any use, distribution, disclosure or copying of this email 
is strictly prohibited. Confidentiality and legal privilege attached to this 
communication are not waived or lost by reason of the mistaken delivery to you. 
If you have received this email in error, please delete it and notify us 
immediately by telephone or email.

Reply via email to