Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver

2025-01-13 Thread Zhenlei Huang


> On Dec 28, 2024, at 6:03 AM, Warner Losh  wrote:
> 
> 
> 
> On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang  > wrote:
> Hi Warner,
> 
> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report 
> for sas speed.
> The boot console has the following,
> 
> ```
> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0
> da0:  Fixed Direct Access SPC-4 SCSI device
> da0: 4294967.295MB/s transfers
> ```
> But camcontrol report the correct value,
> ```
> # camcontrol inquiry da0 -R
> pass1: 750.000MB/s transfers, Command Queueing Enabled
> ```
> 
> The `4294967.295MB` is actually 0x_ or -1 but I do not see any logic 
> set those values.
> 
> Finally I managed to get the stack trace,
> ```
> _scsi_announce_periph
> scsi_announce_periph_sbuf
> xpt_announce_periph_sbuf
> dadone_proberc
> xpt_done_process
> xpt_done_td
> fork_exit
> fork_trampoline
> ```
> 
> and noticed that the last param `cts` of `_scsi_announce_periph(struct 
> cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings 
> *cts)`
> is from kernel stack and is not properly initialized, latter I found some 
> commits related to this,
> 
> 076686fe0703 cam: make sure to clear CCBs allocated on the stack
> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack
> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
> 616a676a0535 cam: clear stack-allocated CCB in the target layer
> 
> I applied them to stable/13, rebuild and reboot, now the speed of da0 is 
> reported correctly. I also tried to patch the pvscsi driver with few lines and
> it also works as intended.
> 
> ```
> --- a/sys/dev/vmware/pvscsi/pvscsi.c
> +++ b/sys/dev/vmware/pvscsi/pvscsi.c
> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb)
> cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB;
> cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ;
> 
> +   /* Prefer connection speed over sas port speed */
> +   /* cts->xport_specific.sas.bitrate = 0; */
> +   cts->xport_specific.sas.valid = 0;
> +
> ccb_h->status = CAM_REQ_CMP;
> xpt_done(ccb);
> ```
> 
> Things come clear and I know why this weird speed happens, now it is time to 
> decide how to fix it.
> 
> Fixing the consumer of cam, aka pvscsi driver, is quite simple and promising. 
> I did a quick search it appears other consumers set 
> `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm 
> quite new to cam subsystem.
> 
> Yes. sas.valid is set when the sas.bitrate and other data has been set 
> correctly. Setting it to 0 like this ensures it's ignored.  So if you know 
> the speed, set sas.bitrate to that speed and sas.valid to 1.

That is clear.

> 
> I'm not sure I answered the question right, but if not, please clarify or 
> point out what I missed and I'll try again.
>  
> Which one do you prefer, MFC commits to stable/13, or do direct fix for 
> pvscsi driver to stable/13 ?
> 
> [[ Sorry for the delay, I missed this all month ]]
> 
> I generally prefer a MFC, unless the code is no longer in -current.

The code live in -current and all supported stable branches.

> Even if there's two different fixes for this logical bug, fixing it in 
> current, then MFCing that (with the current hash) is fine, even if the 
> stable/13 changes are completely different.

The bug does not exist in current and stable/14 but only in stable/13, due to 
Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed into 
stable/13 branch.

> For stable/13 I guess it matters a bit less than stable/14 since I'll be 
> merging to it less, but if it's a commit from -current that doesn't need to 
> be made to -stable because of the new commit on stable, I tend to include the 
> MFC hash text.

Do you mean the `cherry picked from commit` commit message ?

> 
> Warner


I'm preparing and testing the MFCing. Bless me to not make things messed up :)

Best regards,
Zhenlei



Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver

2025-01-13 Thread Zhenlei Huang


> On Jan 13, 2025, at 4:22 PM, Zhenlei Huang  wrote:
> 
> 
> 
>> On Dec 28, 2024, at 6:03 AM, Warner Losh > > wrote:
>> 
>> 
>> 
>> On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang > > wrote:
>> Hi Warner,
>> 
>> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report 
>> for sas speed.
>> The boot console has the following,
>> 
>> ```
>> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0
>> da0:  Fixed Direct Access SPC-4 SCSI device
>> da0: 4294967.295MB/s transfers
>> ```
>> But camcontrol report the correct value,
>> ```
>> # camcontrol inquiry da0 -R
>> pass1: 750.000MB/s transfers, Command Queueing Enabled
>> ```
>> 
>> The `4294967.295MB` is actually 0x_ or -1 but I do not see any logic 
>> set those values.
>> 
>> Finally I managed to get the stack trace,
>> ```
>> _scsi_announce_periph
>> scsi_announce_periph_sbuf
>> xpt_announce_periph_sbuf
>> dadone_proberc
>> xpt_done_process
>> xpt_done_td
>> fork_exit
>> fork_trampoline
>> ```
>> 
>> and noticed that the last param `cts` of `_scsi_announce_periph(struct 
>> cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings 
>> *cts)`
>> is from kernel stack and is not properly initialized, latter I found some 
>> commits related to this,
>> 
>> 076686fe0703 cam: make sure to clear CCBs allocated on the stack
>> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack
>> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
>> 616a676a0535 cam: clear stack-allocated CCB in the target layer
>> 
>> I applied them to stable/13, rebuild and reboot, now the speed of da0 is 
>> reported correctly. I also tried to patch the pvscsi driver with few lines 
>> and
>> it also works as intended.
>> 
>> ```
>> --- a/sys/dev/vmware/pvscsi/pvscsi.c
>> +++ b/sys/dev/vmware/pvscsi/pvscsi.c
>> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb)
>> cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB;
>> cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ;
>> 
>> +   /* Prefer connection speed over sas port speed */
>> +   /* cts->xport_specific.sas.bitrate = 0; */
>> +   cts->xport_specific.sas.valid = 0;
>> +
>> ccb_h->status = CAM_REQ_CMP;
>> xpt_done(ccb);
>> ```
>> 
>> Things come clear and I know why this weird speed happens, now it is time to 
>> decide how to fix it.
>> 
>> Fixing the consumer of cam, aka pvscsi driver, is quite simple and 
>> promising. I did a quick search it appears other consumers set 
>> `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm 
>> quite new to cam subsystem.
>> 
>> Yes. sas.valid is set when the sas.bitrate and other data has been set 
>> correctly. Setting it to 0 like this ensures it's ignored.  So if you know 
>> the speed, set sas.bitrate to that speed and sas.valid to 1.
> 
> That is clear.
> 
>> 
>> I'm not sure I answered the question right, but if not, please clarify or 
>> point out what I missed and I'll try again.
>>  
>> Which one do you prefer, MFC commits to stable/13, or do direct fix for 
>> pvscsi driver to stable/13 ?
>> 
>> [[ Sorry for the delay, I missed this all month ]]
>> 
>> I generally prefer a MFC, unless the code is no longer in -current. 
> 
> The code live in -current and all supported stable branches.
> 
>> Even if there's two different fixes for this logical bug, fixing it in 
>> current, then MFCing that (with the current hash) is fine, even if the 
>> stable/13 changes are completely different. 
> 
> The bug does not exist in current and stable/14 but only in stable/13, due to 
> Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed into 
> stable/13 branch.
> 
>> For stable/13 I guess it matters a bit less than stable/14 since I'll be 
>> merging to it less, but if it's a commit from -current that doesn't need to 
>> be made to -stable because of the new commit on stable, I tend to include 
>> the MFC hash text.
> 
> Do you mean the `cherry picked from commit` commit message ?
> 
>> 
>> Warner
> 
> 
> I'm preparing and testing the MFCing. Bless me to not make things messed up :)

And the individual fix for pvscsi is posted to 
https://reviews.freebsd.org/D48438  .

> 
> Best regards,
> Zhenlei