Re: [pve-devel] [PATCH storage 2/2] fix #6224: disks: get: set timeout for retrieval of SMART stat data

Daniel Kral Mon, 14 Apr 2025 23:42:38 -0700

Thanks for the review, Max! :)

On 4/11/25 18:04, Max Carrara wrote:

On Fri Apr 11, 2025 at 5:08 PM CEST, Daniel Kral wrote:

As mentioned in the Bugzilla and indicated above, I haven't found any
clear indicator for this happening besides that the most affected
devices seem to be USB devices, which use the mentioned UAS kernel
module.


Have you perhaps found any way to test this? I could then try to
replicate this behaviour. Otherwise no hard feelings; I think setting a
shorter timeout for (usually) smaller commands is something we should do
in general.

Unfortunately not, I've tried all the (4) USB devices I had on me, butsadly none of them had those quirks ;). I tested only that the errorpath works correctly with simply substituting the smartctl command with`sleep 11` and `sh -c 'exit 3'` for the timeout + non-zero return.

It'd be sure great if someone with an affected disk could test thisdirectly, I'll forward it to the Bugzilla entry and forum post so itmight get more coverage.

(That being said, looking through the code of PVE::Tools::run_command---
I'm surprised we don't set a default timeout there at all. I think
introducing one there could perhaps break something unexpected, though,
so I'd rather not touch it.)

Yes, I'd guess that there would be some places where the $noerr is set,but $timeout will error anyway now AFAICS as here, so there'd be quite afew places which do not have error handlers setup. I hope that smartctlis more of an odd case here as the timeout is quite high because of reasons.

I'm fine lowering the timeout further, but 10 seconds seemed reasonable
if only one disk is affected for now, so that loading takes some time
and not seemingly forever.


Given that I've never had a single device take longer than a split
second, I think this is quite reasonable too.


I was also thinking about just caching which disks have had that
behavior and just not running the command for them, but I thought this
would add more complexity than needed here.


I agree that this would be a little too much; you'd also have to
invalidate cache entries after a certain time / a certain condition etc.
You'd also have to handle the case where the disk starts to magically
respond to `smartctl` again. Better to just keep the timeout here as-is.

Agreed, that would be way too much for this. And as it seems from theforum, it was probably a faulty controller / firmware (?) anyway [0].


[0] https://forum.proxmox.com/threads/164799/#post-763192

Either way, nice work! For both patches, consider:

Reviewed-by: Max Carrara <m.carr...@proxmox.com>

(Though, I'd still like to test this somehow, if you found a way to do so)



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH storage 2/2] fix #6224: disks: get: set timeout for retrieval of SMART stat data

Reply via email to