In rare scenarios, `smartctl` takes up to 60 seconds to timeout for SCSI
commands to be completed, as reported in our user forum [0] and bugzilla
[1]. It seems that USB drives handled by the USB Attached SCSI (UAS)
kernel module are more likely to be affected by this [2], but is more of
a case-by-case situation.

Therefore, set a more reasonable timeout of 10 seconds, so that callers
don't have to wait too long or seem unresponsive (e.g. Node Disks view
in the WebGUI).

[0] https://forum.proxmox.com/threads/164799/
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=6224
[2] https://www.smartmontools.org/wiki/SAT-with-UAS-Linux

Signed-off-by: Daniel Kral <d.k...@proxmox.com>
---
As mentioned in the Bugzilla and indicated above, I haven't found any
clear indicator for this happening besides that the most affected
devices seem to be USB devices, which use the mentioned UAS kernel
module.

I'm fine lowering the timeout further, but 10 seconds seemed reasonable
if only one disk is affected for now, so that loading takes some time
and not seemingly forever.

I was also thinking about just caching which disks have had that
behavior and just not running the command for them, but I thought this
would add more complexity than needed here.

 src/PVE/Diskmanage.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/PVE/Diskmanage.pm b/src/PVE/Diskmanage.pm
index 059d645..6aa1338 100644
--- a/src/PVE/Diskmanage.pm
+++ b/src/PVE/Diskmanage.pm
@@ -98,7 +98,7 @@ sub get_smart_data {
     push @$cmd, $disk;
 
     my $returncode = eval {
-       run_command($cmd, noerr => 1, outfunc => sub {
+       run_command($cmd, noerr => 1, timeout => 10, outfunc => sub {
            my ($line) = @_;
 
 # ATA SMART attributes, e.g.:
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to