Thanks for the info. Ive downloaded the patch, but am I supposed to run this on source code and compile, or what? I don't suppose you could give me a quick rundown on how to apply the patch correctly? Thanks, Kevin ________________________________ From: Spitz, Cory James <[email protected]> Sent: Thursday, February 27, 2020 4:58 PM To: Konzem, Kevin P <[email protected]>; Nathan Dauchy - NOAA Affiliate <[email protected]> Cc: [email protected] <[email protected]> Subject: [EXTERNAL] Re: [lustre-discuss] Re: DF bug with lustre 2.12.4
Hello, Kevin. I see from LU-13285 that Nathan D. pointed you at LU-13296. I left a comment in the ticket as well. I think that you can try the patch from LU-13296 with your reproducer. -Cory On 2/21/20, 10:08 AM, "lustre-discuss on behalf of Konzem, Kevin P" <[email protected]<mailto:[email protected]> on behalf of [email protected]<mailto:[email protected]>> wrote: Nathan, Ive created a Jira issue for this, LU-13285<https://jira.whamcloud.com/browse/LU-13285>. In it I attached the output of an strace where I was able to capture a string of both successful and failed df's. ________________________________ From: Nathan Dauchy - NOAA Affiliate <[email protected]> Sent: Thursday, February 20, 2020 2:35 PM To: Konzem, Kevin P <[email protected]> Cc: [email protected] <[email protected]> Subject: [EXTERNAL] Re: [lustre-discuss] DF bug with lustre 2.12.4 On Thu, Feb 20, 2020 at 11:47 AM Konzem, Kevin P <[email protected]<mailto:[email protected]>> wrote: test this by running 'while [ true ];do /bin/df -TP /performance;done' on two sessions on the same client. As soon as I start the second while loop, the outputs go from: Filesystem Type 1024-blocks Used Available Capacity Mounted on 192.168.0.181@tcp:/perform lustre 71467728 100416 67664944 1% /performance to: Filesystem Type 1024-blocks Used Available Capacity Mounted on 192.168.0.181@tcp:/perform lustre 0 -0 -0 50% /performance Kevin, I can confirm seeing this issue intermittently as well, and usually with a re-run of df the results are once again reasonable. It looks like you have a more reliable reproducer though, which is good! A support ticket was opened with our vendor, and they said if we can capture a "strace" of it for a bad run that might be helpful... but I haven't caught it in the act yet. With your reproducer, can you get that and open a Jira ticket to track the problem? As a workaround, try "lfs df" instead, it may take a different code path that avoids the bug. -Nathan
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
