> On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <z...@freebsd.org> wrote:
> 
> 
> 
>> On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <z...@freebsd.org 
>> <mailto:z...@freebsd.org>> wrote:
>> 
>> 
>> 
>>> On Dec 28, 2024, at 6:03 AM, Warner Losh <i...@bsdimp.com 
>>> <mailto:i...@bsdimp.com>> wrote:
>>> 
>>> 
>>> 
>>> On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang <z...@freebsd.org 
>>> <mailto:z...@freebsd.org>> wrote:
>>> Hi Warner,
>>> 
>>> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird 
>>> report for sas speed.
>>> The boot console has the following,
>>> 
>>> ```
>>> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0
>>> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
>>> da0: 4294967.295MB/s transfers
>>> ```
>>> But camcontrol report the correct value,
>>> ```
>>> # camcontrol inquiry da0 -R
>>> pass1: 750.000MB/s transfers, Command Queueing Enabled
>>> ```
>>> 
>>> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any 
>>> logic set those values.
>>> 
>>> Finally I managed to get the stack trace,
>>> ```
>>> _scsi_announce_periph
>>> scsi_announce_periph_sbuf
>>> xpt_announce_periph_sbuf
>>> dadone_proberc
>>> xpt_done_process
>>> xpt_done_td
>>> fork_exit
>>> fork_trampoline
>>> ```
>>> 
>>> and noticed that the last param `cts` of `_scsi_announce_periph(struct 
>>> cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings 
>>> *cts)`
>>> is from kernel stack and is not properly initialized, latter I found some 
>>> commits related to this,
>>> 
>>> 076686fe0703 cam: make sure to clear CCBs allocated on the stack
>>> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack
>>> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
>>> 616a676a0535 cam: clear stack-allocated CCB in the target layer
>>> 
>>> I applied them to stable/13, rebuild and reboot, now the speed of da0 is 
>>> reported correctly. I also tried to patch the pvscsi driver with few lines 
>>> and
>>> it also works as intended.
>>> 
>>> ```
>>> --- a/sys/dev/vmware/pvscsi/pvscsi.c
>>> +++ b/sys/dev/vmware/pvscsi/pvscsi.c
>>> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb)
>>>                 cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB;
>>>                 cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ;
>>> 
>>> +               /* Prefer connection speed over sas port speed */
>>> +               /* cts->xport_specific.sas.bitrate = 0; */
>>> +               cts->xport_specific.sas.valid = 0;
>>> +
>>>                 ccb_h->status = CAM_REQ_CMP;
>>>                 xpt_done(ccb);
>>> ```
>>> 
>>> Things come clear and I know why this weird speed happens, now it is time 
>>> to decide how to fix it.
>>> 
>>> Fixing the consumer of cam, aka pvscsi driver, is quite simple and 
>>> promising. I did a quick search it appears other consumers set 
>>> `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm 
>>> quite new to cam subsystem.
>>> 
>>> Yes. sas.valid is set when the sas.bitrate and other data has been set 
>>> correctly. Setting it to 0 like this ensures it's ignored.  So if you know 
>>> the speed, set sas.bitrate to that speed and sas.valid to 1.
>> 
>> That is clear.
>> 
>>> 
>>> I'm not sure I answered the question right, but if not, please clarify or 
>>> point out what I missed and I'll try again.
>>>  
>>> Which one do you prefer, MFC commits to stable/13, or do direct fix for 
>>> pvscsi driver to stable/13 ?
>>> 
>>> [[ Sorry for the delay, I missed this all month ]]
>>> 
>>> I generally prefer a MFC, unless the code is no longer in -current. 
>> 
>> The code live in -current and all supported stable branches.
>> 
>>> Even if there's two different fixes for this logical bug, fixing it in 
>>> current, then MFCing that (with the current hash) is fine, even if the 
>>> stable/13 changes are completely different. 
>> 
>> The bug does not exist in current and stable/14 but only in stable/13, due 
>> to Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed 
>> into stable/13 branch.
>> 
>>> For stable/13 I guess it matters a bit less than stable/14 since I'll be 
>>> merging to it less, but if it's a commit from -current that doesn't need to 
>>> be made to -stable because of the new commit on stable, I tend to include 
>>> the MFC hash text.
>> 
>> Do you mean the `cherry picked from commit` commit message ?
>> 
>>> 
>>> Warner
>> 
>> 
>> I'm preparing and testing the MFCing. Bless me to not make things messed up 
>> :)
> 
> And the individual fix for pvscsi is posted to 
> https://reviews.freebsd.org/D48438 <https://reviews.freebsd.org/D48438> .

Landed in -current. Will be MFCed to stable/13 after a few days.

> 
>> 
>> Best regards,
>> Zhenlei



Reply via email to