сб, 9 окт. 2021 г. в 14:59, Warner Losh <i...@bsdimp.com>:

>
>
> On Sat, Oct 9, 2021, 8:44 AM Pavel Timofeev <tim...@gmail.com> wrote:
>
>>
>>
>> пт, 8 окт. 2021 г. в 14:49, Warner Losh <i...@bsdimp.com>:
>>
>>>
>>>
>>> On Fri, Oct 8, 2021 at 2:42 PM Pavel Timofeev <tim...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> сб, 21 авг. 2021 г. в 15:22, Warner Losh <i...@bsdimp.com>:
>>>>
>>>>>
>>>>>
>>>>> On Sat, Aug 21, 2021 at 3:06 PM Pavel Timofeev <tim...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>  Warner Losh <i...@bsdimp.com>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 20, 2021 at 10:42 PM Pavel Timofeev <tim...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  Pavel Timofeev <tim...@gmail.com>:
>>>>>>>>
>>>>>>>> >
>>>>>>>> > Chuck Tuffli <ctuf...@gmail.com>:
>>>>>>>> >
>>>>>>>> >> On Mon, Aug 16, 2021 at 7:43 PM Pavel Timofeev <tim...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >> >
>>>>>>>> >> > Hello
>>>>>>>> >> > I've got a Dell Latitude 7400 and tried installing the latest
>>>>>>>> >> 14.0-CURRENT
>>>>>>>> >> > (main-n248636-d20e9e02db3) on it.
>>>>>>>> >> > Despite other things the weird one which concerns me is
>>>>>>>> >> >   nvme0: Missing interrupt
>>>>>>>> >> > message I get sometimes on the console.
>>>>>>>> >> > It seems like I get it only after the reboot of the laptop, i.
>>>>>>>> e. not
>>>>>>>> >> > getting that message if I power cycle the laptop, at least I
>>>>>>>> haven't
>>>>>>>> >> seen
>>>>>>>> >> > them for now in such cases.
>>>>>>>> >> > So when the laptop is rebooted I can't even take advantage of
>>>>>>>> >> > nvmecontrol(8) quickly.
>>>>>>>> >> > Well, it still works, but it takes tens of seconds to return
>>>>>>>> the output.
>>>>>>>> >> ...
>>>>>>>> >> > dmesg when power cycled -
>>>>>>>> >> >
>>>>>>>> https://drive.google.com/file/d/1dB27oB1O2CcnZy6DvOOhmFO8SN8V8SwJ
>>>>>>>> >> > dmesg when rebooted -
>>>>>>>> >> >
>>>>>>>> https://drive.google.com/file/d/1DsKTMkihp_OmUcirByLaVO4o2mU38Bxh
>>>>>>>> >>
>>>>>>>> >> I'm sort of curious about the time stamps for the log messages
>>>>>>>> in the
>>>>>>>> >> failing case. Something like:
>>>>>>>> >>
>>>>>>>> >> $ grep "nv\(me\|d\)" /var/log/messages
>>>>>>>> >>
>>>>>>>> >> --chuck
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > Well, I can't see timestamps in the verbose boot log. Am I
>>>>>>>> missing some
>>>>>>>> > configuration for that?
>>>>>>>> >
>>>>>>>> > $ grep "nv\(me\|d\)" /var/log/messages
>>>>>>>> > nvme0: <Generic NVMe Device> mem
>>>>>>>> > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff
>>>>>>>> at device
>>>>>>>> > 0.0 on pci6
>>>>>>>> > nvme0: attempting to allocate 5 MSI-X vectors (17 supported)
>>>>>>>> > nvme0: using IRQs 133-137 for MSI-X
>>>>>>>> > nvme0: CapLo: 0x140103ff: MQES 1023, CQR, TO 20
>>>>>>>> > nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, MPSMIN 0, MPSMAX
>>>>>>>> 0
>>>>>>>> > nvme0: Version: 0x00010300: 1.3
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvme0: Missing interrupt
>>>>>>>> > nvd0: <PC611 NVMe SK hynix 512GB> NVMe namespace
>>>>>>>> > GEOM: new disk nvd0
>>>>>>>> > nvd0: 488386MB (1000215216 512 byte sectors)
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> Ah, sorry, provided wrong output.
>>>>>>>> Here is what you requested:
>>>>>>>> $ grep "nv\(me\|d\)" /var/log/messages
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: <Generic NVMe Device> mem
>>>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff
>>>>>>>> at device
>>>>>>>> 0.0 on pci6
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: attempting to allocate 5
>>>>>>>> MSI-X
>>>>>>>> vectors (17 supported)
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: using IRQs 133-137 for MSI-X
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapLo: 0x140103ff: MQES
>>>>>>>> 1023, CQR,
>>>>>>>> TO 20
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapHi: 0x00000030: DSTRD 0,
>>>>>>>> NSSRS,
>>>>>>>> CSS 1, MPSMIN 0, MPSMAX 0
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Version: 0x00010300: 1.3
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: <PC611 NVMe SK hynix 512GB>
>>>>>>>> NVMe
>>>>>>>> namespace
>>>>>>>> Aug 21 04:34:36 nostromo kernel: GEOM: new disk nvd0
>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: 488386MB (1000215216 512 byte
>>>>>>>> sectors)
>>>>>>>> Aug 21 04:34:42 nostromo kernel: nvme0: Missing interrupt
>>>>>>>> Aug 21 04:35:36 nostromo kernel: nvme0: Missing interrupt
>>>>>>>> Aug 21 04:35:50 nostromo kernel: nvme0: Missing interrupt
>>>>>>>>
>>>>>>>
>>>>>>> What happens if you set hw.nvme.use_nvd=0 and hw.cam.nda.nvd_compat=1
>>>>>>> in the boot loader and reboot? Same thing except nda where nvd was?
>>>>>>> Or does
>>>>>>> it work?
>>>>>>>
>>>>>>> Something weird is going on in the interrupt assignment, I think,
>>>>>>> but I
>>>>>>> wanted to get any nvd vs nda issues out of the way first.
>>>>>>>
>>>>>>> Warner
>>>>>>>
>>>>>>
>>>>>> Do you mean kern.cam.nda.nvd_compat instead of hw.cam.nda.nvd_compat?
>>>>>> kern.cam.nda.nvd_compat is 1 by default now.
>>>>>>
>>>>>> So I tried to set  hw.nvme.use_nvd to 1 as suggested, but I still see
>>>>>>   nvme0: Missing interrupt
>>>>>> and now also
>>>>>>   Root mount waiting for: CAM
>>>>>> messages besides those
>>>>>>
>>>>>
>>>>> OK. That all makes sense. I'd forgotten that nvd_compat=1 by default
>>>>> these
>>>>> days.
>>>>>
>>>>> I'll take a look on monday starting at the differences in interrupt
>>>>> assignment that
>>>>> are apparent when you cold boot vs reboot.
>>>>>
>>>>> Thanks for checking... I'd hoped this was a cheap fix, but also didn't
>>>>> really
>>>>> expect it to be.
>>>>>
>>>>> Warner
>>>>>
>>>>>
>>>> I've recently upgraded to main-n249974-17f790f49f5 and it got even
>>>> worse now.
>>>> So clean poweron works as before.
>>>> But if rebooted nvme drive refuses to work, while before the code
>>>> upgrade it was just complaining about missing interrupts.
>>>>
>>>> currently dmesg show this:
>>>> nvme0: <Generic NVMe Device> mem
>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at device
>>>> 0.0 on pci6
>>>> nvd0: <PC611 NVMe SK hynix 512GB> NVMe namespace
>>>> nvd0: 488386MB (1000215216 512 byte sectors)
>>>> nvme0: <Generic NVMe Device> mem
>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at device
>>>> 0.0 on pci6
>>>>
>>>
>>> Why is this showing up twice? Or is everything above this line left over
>>> from the first, working boot?
>>>
>>>
>>>> nvme0: RECOVERY_START 9585870784 vs 9367036288
>>>> nvme0: timeout with nothing complete, resetting
>>>> nvme0: Resetting controller due to a timeout.
>>>> nvme0: RECOVERY_WAITING
>>>> nvme0: resetting controller
>>>> nvme0: aborting outstanding admin command
>>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000
>>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0
>>>> nvme0: nvme_identify_controller failed!
>>>> nvme0: waiting
>>>>
>>>
>>> Clearly something bad is going on with the drive here... We looked into
>>> the completion queues when we didn't get an interrupt and there was nothing
>>> complete there....
>>>
>>> The only thing I can think of is that this means there's a phase error
>>> between the drive and the system. I recently removed a second reset and
>>> made it an option NVME_2X_RESET. Can you see if adding
>>> 'options NVME_2X_RESET' to your kernel config fixes this?
>>>
>>> Warner
>>>
>>>
>>>> nvme0: <Generic NVMe Device> mem
>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at device
>>>> 0.0 on pci6
>>>> nvme0: RECOVERY_START 9362778467 vs 9361830561
>>>> nvme0: timeout with nothing complete, resetting
>>>> nvme0: Resetting controller due to a timeout.
>>>> nvme0: RECOVERY_WAITING
>>>> nvme0: resetting controller
>>>> nvme0: aborting outstanding admin command
>>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000
>>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0
>>>> nvme0: nvme_identify_controller failed!
>>>> nvme0: waiting
>>>>
>>>>
>>
>> Sorry, it's showing twice due to multiple reboots. For one boot it's like:
>> nvme0: <Generic NVMe Device> mem
>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at device
>> 0.0 on pci6
>> nvme0: RECOVERY_START 9633303481 vs 9365971423
>> nvme0: timeout with nothing complete, resetting
>> nvme0: Resetting controller due to a timeout.
>> nvme0: RECOVERY_WAITING
>> nvme0: resetting controller
>> nvme0: aborting outstanding admin command
>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000
>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0
>> nvme0: nvme_identify_controller failed!
>> nvme0: waiting
>>
>> Well, neither Windows not Linux have any problems with the device. I
>> understand they may be hiding it or workaround somehow.
>>
>
> Yea, I'm trying to figure out why your machine is different than mine, and
> what Windows or Linux do that is different. It may be dodgy hardware, but
> others have no trouble...
>
> I'll try setting NVME_2X_RESET in the kernel config and report back in a
>> while.
>>
>
> Thanks. If it helps, that tells me something. If it doesn't, that tells me
> something else.
>
> I suspect that it is somewhere else in the system, tbh, but I need to find
> it systematically.
>
> Warner
>

Surprisingly, setting NVME_2X_RESET in the kernel config hasn't changed
anything. I. e it didn't help.

Reply via email to