Re: nvme timeout issues with hardware and bhyve vm's

2023-12-08 Thread Pete Wright
On Thu, Dec 07, 2023 at 04:19:12PM -0800, Chuck Tuffli wrote: > On Thu, Dec 7, 2023 at 2:39 PM Pete Wright wrote: > ... > > Hi Warner, just resurfacing this thread because I've had a few lockups > > on my workstation running 14.0-STABLE. I was able to capture a photo of > > the hang and this seem

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Bakul Shah
Thanks. It may be worth checking the temp periodically and warning the user in case it is too high (70ºC+ or something). Even for devices that allow internal throttling, a user might wish to know whether the device neads a (better) heatsink. > On Dec 7, 2023, at 5:02 PM, Maxim Sobolev wrote:

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Maxim Sobolev
How quickly it heats up depends on lots of factors. Usually those devices burn some 3-7 watts per stick at 100% load, so maybe this would give you some idea. At least some of them support several toggleable performance modes, which use throttling internally to limit power consumption to a certain l

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Chuck Tuffli
On Thu, Dec 7, 2023 at 2:39 PM Pete Wright wrote: ... > Hi Warner, just resurfacing this thread because I've had a few lockups > on my workstation running 14.0-STABLE. I was able to capture a photo of > the hang and this seems to be the most important line: > > nvme0: Resetting controller due to

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Bakul Shah
On Dec 7, 2023, at 3:59 PM, Warner Losh wrote: > > > *Overheating caused hang of NVMe controller or PCI bridge on SSD, or > > Yes. Most drive's firmware when it overheats resets. There might be something > that the pci code can do when this happens to retrain the link, reprogram the > config r

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Warner Losh
On Thu, Dec 7, 2023 at 4:09 PM Tomoaki AOKI wrote: > On Thu, 7 Dec 2023 14:38:37 -0800 > Pete Wright wrote: > > > > > > > On 10/13/23 7:34 PM, Warner Losh wrote: > > > > > > > > > > > the messages i posted in the start of the thread are from the VM > itself > > > (13.2-RELEASE). The zpo

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Pete Wright
On 12/7/23 3:16 PM, Craig Leres wrote: On 12/7/23 15:09, Tomoaki AOKI wrote: If I myself encounter this kind of problem ON BARE METAL HARDWARE, I would usually suspect   *Overheating caused hang of NVMe controller or PCI bridge on SSD, or This would also be my first guess. Five years ago

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Pete Wright
On 12/7/23 2:49 PM, Warner Losh wrote: On Thu, Dec 7, 2023 at 3:38 PM Pete Wright > wrote: On 10/13/23 7:34 PM, Warner Losh wrote: > > >     the messages i posted in the start of the thread are from the VM itself >     (13.2-RELEAS

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Craig Leres
On 12/7/23 15:09, Tomoaki AOKI wrote: If I myself encounter this kind of problem ON BARE METAL HARDWARE, I would usually suspect *Overheating caused hang of NVMe controller or PCI bridge on SSD, or This would also be my first guess. Five years ago I had an nmve in an intel nuc that would so

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Tomoaki AOKI
On Thu, 7 Dec 2023 14:38:37 -0800 Pete Wright wrote: > > > On 10/13/23 7:34 PM, Warner Losh wrote: > > > > > > > the messages i posted in the start of the thread are from the VM itself > > (13.2-RELEASE).  The zpool on the hypervisor (13.2-RELEASE) showed no > > such issues. > >

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Warner Losh
On Thu, Dec 7, 2023 at 3:38 PM Pete Wright wrote: > > > On 10/13/23 7:34 PM, Warner Losh wrote: > > > > > > > the messages i posted in the start of the thread are from the VM > itself > > (13.2-RELEASE). The zpool on the hypervisor (13.2-RELEASE) showed no > > such issues. > > > >

Re: nvme timeout issues with hardware and bhyve vm's

2023-12-07 Thread Pete Wright
On 10/13/23 7:34 PM, Warner Losh wrote: the messages i posted in the start of the thread are from the VM itself (13.2-RELEASE).  The zpool on the hypervisor (13.2-RELEASE) showed no such issues. Based on your comment about the improvements in 14 I'll focus my efforts

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-16 Thread Chuck Tuffli
On Fri, Oct 13, 2023 at 7:34 PM Warner Losh wrote: ... > Let me now if you see similar messages in stable/14. I think I've fixed all > the > issues with timeouts, though you shouldn't ever seem them in a vm setup > unless something else weird is going on. I'd be interested in a repo case too as

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread void
On Sun, 15 Oct 2023, at 15:53, Warner Losh wrote: > The one with the uboot traceback? I can't help you there. The report is > confusing. I don't know the error / problem being reported to even know > what to look at. Or is it a different thing? I'm so confused at this > point. I also think we

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread Warner Losh
On Sun, Oct 15, 2023, 9:47 AM void wrote: > On Sun, 15 Oct 2023, at 15:35, Warner Losh wrote: > > > I've fixed all known nvme issues in current that aren't caused by other > > parts of the system. If it isn't a very recent 15 or 14, then there > > are known issues and you'll need to try those fi

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread void
On Sun, 15 Oct 2023, at 15:35, Warner Losh wrote: > I've fixed all known nvme issues in current that aren't caused by other > parts of the system. If it isn't a very recent 15 or 14, then there > are known issues and you'll need to try those first. The problem manifested with a source upgrade

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread Warner Losh
On Sun, Oct 15, 2023, 9:28 AM void wrote: > Hi, > > On Fri, 13 Oct 2023, at 03:40, Pete Wright wrote: > > I had similar issues on my workstation as well. Scrubbing the NVMe > > device on my real-hardware workstation hasn't turned up any issues, but > > the system has locked up a handful of times

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread void
Hi, On Fri, 13 Oct 2023, at 03:40, Pete Wright wrote: > I had similar issues on my workstation as well. Scrubbing the NVMe > device on my real-hardware workstation hasn't turned up any issues, but > the system has locked up a handful of times. > > Just curious if others have seen the same, or i

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-13 Thread Warner Losh
On Fri, Oct 13, 2023 at 11:47 AM Pete Wright wrote: > > > On 10/13/23 6:24 AM, Warner Losh wrote: > > > > > > On Thu, Oct 12, 2023, 10:53 PM Pete Wright > > wrote: > > > > > > > > On 10/12/23 8:45 PM, Warner Losh wrote: > > > What version is that kernel? > >

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-13 Thread Pete Wright
On 10/13/23 6:24 AM, Warner Losh wrote: On Thu, Oct 12, 2023, 10:53 PM Pete Wright > wrote: On 10/12/23 8:45 PM, Warner Losh wrote: > What version is that kernel? oh dang i sent this to the wrong list, i'm not running current.  the hypervisor

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-13 Thread Warner Losh
On Thu, Oct 12, 2023, 10:53 PM Pete Wright wrote: > > > On 10/12/23 8:45 PM, Warner Losh wrote: > > What version is that kernel? > > oh dang i sent this to the wrong list, i'm not running current. the > hypervisor and vm are both 13.2 and my workstation is a recent 14.0 > pre-release build. i'l

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-12 Thread Pete Wright
On 10/12/23 8:45 PM, Warner Losh wrote: What version is that kernel? oh dang i sent this to the wrong list, i'm not running current. the hypervisor and vm are both 13.2 and my workstation is a recent 14.0 pre-release build. i'll do more homework tomorrow and post to questions or a more

Re: nvme timeout issues with hardware and bhyve vm's

2023-10-12 Thread Warner Losh
What version is that kernel? Warner On Thu, Oct 12, 2023, 9:41 PM Pete Wright wrote: > hey there - i was curious if anyone has had issues with nvme devices > recently. i'm chasing down similar issues on my workstation which has a > physical NVMe zroot, and on a bhyve VM which has a large pool

nvme timeout issues with hardware and bhyve vm's

2023-10-12 Thread Pete Wright
hey there - i was curious if anyone has had issues with nvme devices recently. i'm chasing down similar issues on my workstation which has a physical NVMe zroot, and on a bhyve VM which has a large pool exposed as a NVMe device (and is backed by a zvol). on the most recent bhyve issue the VM