We use a simple script to find clients that are hitting the OST. I borrowed
this from somewhere a decade ago and still use it.
A simple bash script your run on your OSS to see all the clients hitting the
various OSTs.
#!/bin/bash
set -e
SLEEP=10
stats_clear()
{
cd $1
echo clear >clear
}
stats_print()
{
cd $1
echo "===================== $1 ============================"
for i in *; do
[ -d $i ] || continue
out=`cat ${i}/stats | grep -v "snapshot_time" | grep -v "ping"
|| true`
[ -n "$out" ] || continue
echo $i $out
done
echo
"============================================================================================="
echo
}
for i in /proc/fs/lustre/obdfilter/*OST*; do
dir="${i}/exports"
[ -d "$dir" ] || continue
stats_clear "$dir"
done
echo "Waiting ${SLEEP}s after clearing stats"
sleep $SLEEP
for i in /proc/fs/lustre/obdfilter/*OST*; do
dir="${i}/exports"
[ -d "$dir" ] || continue
stats_print "$dir"
done
From: Moreno Diego (ID SIS) <[email protected]>
Sent: Tuesday, October 29, 2019 10:08 AM
To: Louis Allen <[email protected]>; Oral, H. <[email protected]>; Carlson,
Timothy S <[email protected]>; [email protected]
Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Timeouts/Filesystem Hanging
Hi Louis,
If you don’t have any particular monitoring on the servers (Prometheus,
ganglia, etc..) you could also use sar (sysstat) or a similar tool to confirm
the CPU waits for IO. Also the device saturation on sar or with iostat. For
instance:
avg-cpu: %user %nice %system %iowait %steal %idle
0.19 0.00 6.09 0.10 0.06 93.55
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 1.20 0.20 0.60 0.00 0.01 20.00
0.00 0.75 1.00 0.67 0.75 0.06
sdb 0.00 136.80 2.80 96.60 0.81 9.21 206.42
0.19 1.91 26.29 1.20 0.55 5.46
sdc 0.00 144.20 58.80 128.00 2.34 16.82 210.08
0.24 1.31 2.68 0.68 0.66 12.40
Then if you enable lustre job stats you can check on that specific device which
job is doing most IO. Last but not least you could also parse which specific
NID is doing the intensive IO on that OST
(/proc/fs/lustre/obdfilter/<fs>-OST0007/exports/*/stats).
Regards,
Diego
From: lustre-discuss
<[email protected]<mailto:[email protected]>>
on behalf of Louis Allen <[email protected]<mailto:[email protected]>>
Date: Tuesday, 29 October 2019 at 17:43
To: "Oral, H." <[email protected]<mailto:[email protected]>>, "Carlson, Timothy S"
<[email protected]<mailto:[email protected]>>,
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Timeouts/Filesystem Hanging
Thanks, will take a look.
Any other areas i should be looking? Should i be applying any Lustre tuning?
Thanks
Get Outlook for
Android<https://protect2.fireeye.com/v1/url?k=e0bd2a7b-bc0814b4-e0bd006e-0cc47adc5e60-e667844ab7fbc271&q=1&e=037ad19c-795c-446d-b849-ee7ed4756823&u=https%3A%2F%2Faka.ms%2Fghei36>
________________________________
From: Oral, H. <[email protected]<mailto:[email protected]>>
Sent: Monday, October 28, 2019 7:06:41 PM
To: Louis Allen <[email protected]<mailto:[email protected]>>; Carlson,
Timothy S <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: Re: [EXTERNAL] Re: [lustre-discuss] Lustre Timeouts/Filesystem Hanging
For inspecting client side I/O, you can use Darshan.
Thanks,
Sarp
--
Sarp Oral, PhD
National Center for Computational Sciences
Oak Ridge National Laboratory
[email protected]<mailto:[email protected]>
865-574-2173
On 10/28/19, 1:58 PM, "lustre-discuss on behalf of Louis Allen"
<[email protected] on behalf of
[email protected]<mailto:[email protected]%20on%20behalf%20of%[email protected]>>
wrote:
Thanks for the reply, Tim.
Are there any tools I can use to see if that is the cause?
Could any tuning possibly help the situation?
Thanks
________________________________________
From: Carlson, Timothy S
<[email protected]<mailto:[email protected]>>
Sent: Monday, 28 October 2019, 17:24
To: Louis Allen;
[email protected]<mailto:[email protected]>
Subject: RE: Lustre Timeouts/Filesystem Hanging
In my experience, this is almost always related to some code doing really
bad I/O. Let’s say you have a 1000 rank MPI code doing open/read 4k/close on a
few specific files on that OST. That will make for a bad day.
The other place you can see this, and this isn’t your case, is when ZFS
refuses to give up on a disk that is failing and your overall I/O suffers from
ZFS continuing to try to read from a disk that it should just kick out
Tim
From: lustre-discuss
<[email protected]<mailto:[email protected]>>
On Behalf Of Louis Allen
Sent: Monday, October 28, 2019 10:16 AM
To: [email protected]<mailto:[email protected]>
Subject: [lustre-discuss] Lustre Timeouts/Filesystem Hanging
Hello,
Lustre (2.12) seem to be hanging quite frequently (5+ times a day) for us
and one of the OSS servers (out of 4) is reporting an extremely high load
average (150+) but the CPU usage of that server
is actually very low - so it must be related to something else - possibly
CPU_IO_WAIT.
The OSS server we are seeing the high load averages we can also see
multiple LustreError messages in /var/log/messages:
Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Service thread pid 2403 was
inactive for 200.08s. The thread might be hung, or it might only be slow and
will resume later. Dumping the stack trace
for debugging purposes:
Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Skipped 4 previous similar
messages
Oct 28 11:22:23 pazlustreoss001 kernel: Pid: 2403, comm: ll_ost00_068
3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Sun May 26 21:48:35 UTC 2019
Oct 28 11:22:23 pazlustreoss001 kernel: Call Trace:
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc03747c5>]
jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0375e52>]
jbd2_complete_transaction+0x52/0xa0 [jbd2]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0732da2>]
ldiskfs_sync_file+0x2e2/0x320 [ldiskfs]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa52760b0>]
vfs_fsync_range+0x20/0x30
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0c8b651>]
osd_object_sync+0xb1/0x160 [osd_ldiskfs]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0ab48a7>]
tgt_sync+0xb7/0x270 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0dc3731>]
ofd_sync_hdl+0x111/0x530 [ofd]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0aba1da>]
tgt_request_handle+0xaea/0x1580 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a5f80b>]
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a6313c>]
ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa50c1c71>]
kthread+0xd1/0xe0
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa5775c37>]
ret_from_fork_nospec_end+0x0/0x39
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffffffffff>]
0xffffffffffffffff
Oct 28 11:22:23 pazlustreoss001 kernel: LustreError: dumping log to
/tmp/lustre-log.1572261743.2403
Oct 28 11:22:23 pazlustreoss001 kernel: Pid: 2292, comm: ll_ost03_043
3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Sun May 26 21:48:35 UTC 2019
Oct 28 11:22:23 pazlustreoss001 kernel: Call Trace:
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc03747c5>]
jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0375e52>]
jbd2_complete_transaction+0x52/0xa0 [jbd2]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0732da2>]
ldiskfs_sync_file+0x2e2/0x320 [ldiskfs]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa52760b0>]
vfs_fsync_range+0x20/0x30
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0c8b651>]
osd_object_sync+0xb1/0x160 [osd_ldiskfs]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0ab48a7>]
tgt_sync+0xb7/0x270 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0dc3731>]
ofd_sync_hdl+0x111/0x530 [ofd]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0aba1da>]
tgt_request_handle+0xaea/0x1580 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a5f80b>]
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Service thread pid 2403
completed after 200.29s. This indicates the system was overloaded (too many
service threads, or there were not enough hardware
resources).
Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Skipped 48 previous similar
messages
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a6313c>]
ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa50c1c71>]
kthread+0xd1/0xe0
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa5775c37>]
ret_from_fork_nospec_end+0x0/0x39
Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffffffffff>]
0xffffffffffffffff
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org