Hello,

I've done some investigation into the password issue and I think I have the 
proposition on how to solve it. The issue arises due to the driver checking the 
PxIS.TFES(task file error status) and if TFES is set it performs recovery 
steps(this is correct according to AHCI spec 6.2.2) and restarts the 
command(this is left up to the SW according to AHCI spec). In the case of 
SECURITY UNLOCK command if the password is incorrect device will set the abort 
bit in ERROR register and error bit in status register which will cause the 
controller to set the PxIS.TFES bit which will  trigger command retires. In 
case of security commands retries are particularly bad since every incorrect 
password unlock attempt is decreasing the retry counter on the device side 
which eventually leads to device locking itself and aborting further unlock 
commands(power cycle is required to recover).

My proposition for a fix is as follows:

1. The logic in AhciRecoverPortError seems to be largely correct. Even if the 
TFES error bit shouldn't necessarily mean that the command should be retried, 
according to AHCI spec when TFES is set controller will enter the ERR:Fatal 
state and PxCMD.CR needs to be restarted. The only thing that we need to change 
in this function is to return EFI_DEVICE_ERROR when the port restart failed(as 
discussed below).
2. We need to add a logic which will decide if the cmd should be retried. This 
logic should check following conditions:

a) If HBDS is set command should be attempted again. HBDS indicates that there 
was a CRC error during accessing the system memory during DMA transfer(AHCI 
spec 6.1.1).
b) If HBFS is set do not attempt to send the command again. HBFS indicates that 
the pointer given to the AHCI controller is incorrect. Retries won't solve 
that(AHCI spec 6.1.1)
c) If IFS or INFS is set further check PxSERR.ERR and PxSERR.DIAG(based on AHCI 
spec 6.1.2).
i) for PxSERR.DIAG.C(CRC error) -> restart command
ii) for PxSERR.DIAG.B(disparity error) -> CRC error should also be reported so 
it is covered by above
iii) for PxSERR.ERR.P(protocol error) -> do not restart. Protocol errors are 
unlikely to be transient although I am not sure if Internal buffer overflows 
can't be caused by transient conditions on the devices side.
iv) for PxSERR.DIAG.H(handshake errors) -> do not restart. Acording to spec 
handshake errors might be caused by transient conditions such as CRC errors but 
I hope in such case HBA will also signal CRC in PxSERR.DIAG.C(spec is not super 
clear on that point though)
d) If TFES is set further check PxTFD.ERR if PxTFD.ERR bit 7 is set(INTERFACE 
CRC according to ACS-3 spec). For others do not retry.

Hopefully this will catch majority of errors caused by transient conditions 
while also avoiding restarts on commands that are simply malformed. I will 
start working on a patch and if anyone has any suggestions/objections please 
respond.

I will also update bz with this data: 
https://bugzilla.tianocore.org/show_bug.cgi?id=4011 ( 
https://bugzilla.tianocore.org/show_bug.cgi?id=4011 )

Thanks,
Mateusz


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#97840): https://edk2.groups.io/g/devel/message/97840
Mute This Topic: https://groups.io/mt/94411389/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to