On Fri Apr 11, 2025 at 5:08 PM CEST, Daniel Kral wrote: > In rare scenarios, `smartctl` takes up to 60 seconds to timeout for SCSI > commands to be completed, as reported in our user forum [0] and bugzilla > [1]. It seems that USB drives handled by the USB Attached SCSI (UAS) > kernel module are more likely to be affected by this [2], but is more of > a case-by-case situation. > > Therefore, set a more reasonable timeout of 10 seconds, so that callers > don't have to wait too long or seem unresponsive (e.g. Node Disks view > in the WebGUI). > > [0] https://forum.proxmox.com/threads/164799/ > [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6224 > [2] https://www.smartmontools.org/wiki/SAT-with-UAS-Linux > > Signed-off-by: Daniel Kral <d.k...@proxmox.com> > --- > As mentioned in the Bugzilla and indicated above, I haven't found any > clear indicator for this happening besides that the most affected > devices seem to be USB devices, which use the mentioned UAS kernel > module.
Have you perhaps found any way to test this? I could then try to replicate this behaviour. Otherwise no hard feelings; I think setting a shorter timeout for (usually) smaller commands is something we should do in general. (That being said, looking through the code of PVE::Tools::run_command--- I'm surprised we don't set a default timeout there at all. I think introducing one there could perhaps break something unexpected, though, so I'd rather not touch it.) > > I'm fine lowering the timeout further, but 10 seconds seemed reasonable > if only one disk is affected for now, so that loading takes some time > and not seemingly forever. Given that I've never had a single device take longer than a split second, I think this is quite reasonable too. > > I was also thinking about just caching which disks have had that > behavior and just not running the command for them, but I thought this > would add more complexity than needed here. I agree that this would be a little too much; you'd also have to invalidate cache entries after a certain time / a certain condition etc. You'd also have to handle the case where the disk starts to magically respond to `smartctl` again. Better to just keep the timeout here as-is. Either way, nice work! For both patches, consider: Reviewed-by: Max Carrara <m.carr...@proxmox.com> (Though, I'd still like to test this somehow, if you found a way to do so) > > src/PVE/Diskmanage.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/PVE/Diskmanage.pm b/src/PVE/Diskmanage.pm > index 059d645..6aa1338 100644 > --- a/src/PVE/Diskmanage.pm > +++ b/src/PVE/Diskmanage.pm > @@ -98,7 +98,7 @@ sub get_smart_data { > push @$cmd, $disk; > > my $returncode = eval { > - run_command($cmd, noerr => 1, outfunc => sub { > + run_command($cmd, noerr => 1, timeout => 10, outfunc => sub { > my ($line) = @_; > > # ATA SMART attributes, e.g.: _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel