On Wed, Nov 20, 2019 at 6:35 PM Roy Smith <r...@panix.com> wrote:
>
> The last couple of days, I've been having problems with interactive ssh into
> login.tools.wmflabs.org.  Every so often (multiple times an hour, at least), 
> my connection will hang for a few seconds.  Sometimes more like 10-15 
> seconds.  I connect from my home MacOS box on broadband using:

Often the cause of this sort of behavior is the allowed bandwidth
between the bastion server and the NFS server which provides $HOME
directories for both users and tool accounts being saturated by some
activity. Doing something trivial seeming like `cd $HOME; ls` calls
out the the NFS server for the `ls` data and this can end up queuing
for space in the connection to that server.

NFS overload can happen for many reasons, but is more likely when one
or more people are running large scp/sftp downloads from the bastion
to their local computer or running bots or other programs which
generate a lot of disk activity from the bastion directly rather than
launching the process on the job grid or kubernetes cluster.

At the moment I am writing this, `pstree -clapu` on
login.tools.wmflabs.org shows me:

* tools.lziad running a nodejs process with many active threads
* tools.exambot running an irc bot (sopel)
* mzmcbride running a script named touch.py
* tools.editgroups running a script named lag_watcher.sh
* bugreporter running a GNU Screen session with multiple python2 processes open
* tools.rebot running a pywikibot script
* jarbot-ii running an sftp server
* iluvatar running an sftp server
* tools.wikiportretdev running an sftp server
* tools.largedatasetbot running an sftp server
* jjmc89 running an sftp server
* magnus running an sftp server
* tools.mbrt1 running an unrealircd server (?!)

The sftp servers are expected. We currently do not have any other
means for people to upload/download files to and from Toolforge. The
other processes all appear at least on the surface to be things that
would be better suited to running on either the job grid [0] or the
Kubernetes cluster [1].

[0]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid
[1]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes

Bryan
-- 
Bryan Davis              Technical Engagement      Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to