On Fri Apr 11, 2025 at 5:08 PM CEST, Daniel Kral wrote:
> In rare scenarios, `smartctl` takes up to 60 seconds to timeout for SCSI
> commands to be completed, as reported in our user forum [0] and bugzilla
> [1]. It seems that USB drives handled by the USB Attached SCSI (UAS)
> kernel module are more likely to be affected by this [2], but is more of
> a case-by-case situation.
>
> Therefore, set a more reasonable timeout of 10 seconds, so that callers
> don't have to wait too long or seem unresponsive (e.g. Node Disks view
> in the WebGUI).
>
> [0] https://forum.proxmox.com/threads/164799/
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6224
> [2] https://www.smartmontools.org/wiki/SAT-with-UAS-Linux
>
> Signed-off-by: Daniel Kral <d.k...@proxmox.com>
> ---
> As mentioned in the Bugzilla and indicated above, I haven't found any
> clear indicator for this happening besides that the most affected
> devices seem to be USB devices, which use the mentioned UAS kernel
> module.

Have you perhaps found any way to test this? I could then try to
replicate this behaviour. Otherwise no hard feelings; I think setting a
shorter timeout for (usually) smaller commands is something we should do
in general.

(That being said, looking through the code of PVE::Tools::run_command---
I'm surprised we don't set a default timeout there at all. I think
introducing one there could perhaps break something unexpected, though,
so I'd rather not touch it.)

>
> I'm fine lowering the timeout further, but 10 seconds seemed reasonable
> if only one disk is affected for now, so that loading takes some time
> and not seemingly forever.

Given that I've never had a single device take longer than a split
second, I think this is quite reasonable too.

>
> I was also thinking about just caching which disks have had that
> behavior and just not running the command for them, but I thought this
> would add more complexity than needed here.

I agree that this would be a little too much; you'd also have to
invalidate cache entries after a certain time / a certain condition etc.
You'd also have to handle the case where the disk starts to magically
respond to `smartctl` again. Better to just keep the timeout here as-is.


Either way, nice work! For both patches, consider:

Reviewed-by: Max Carrara <m.carr...@proxmox.com>

(Though, I'd still like to test this somehow, if you found a way to do so)

>
>  src/PVE/Diskmanage.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/PVE/Diskmanage.pm b/src/PVE/Diskmanage.pm
> index 059d645..6aa1338 100644
> --- a/src/PVE/Diskmanage.pm
> +++ b/src/PVE/Diskmanage.pm
> @@ -98,7 +98,7 @@ sub get_smart_data {
>      push @$cmd, $disk;
>  
>      my $returncode = eval {
> -     run_command($cmd, noerr => 1, outfunc => sub {
> +     run_command($cmd, noerr => 1, timeout => 10, outfunc => sub {
>           my ($line) = @_;
>  
>  # ATA SMART attributes, e.g.:



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to