Hi Santosh, On 01/07/2019 19.28, santosh.shilim...@oracle.com wrote: >> > Below. All command timeouts are 60 seconds. > > enum { > MLX4_CMD_TIME_CLASS_A = 60000, > MLX4_CMD_TIME_CLASS_B = 60000, > MLX4_CMD_TIME_CLASS_C = 60000, > }; >
Thank you for the pointer. > But having said that, I re-looked the code you are patching > and thats actually only FRWR code which is purely work-request > based so this command timeout shouldn't matter. > Which brings us back full circle to the question of what the timeout ought to be? Please keep in mind that prior to this fix, the RDS code didn't wait at all: It simply posted those registration (IB_WR_REG_MR) and invalidation (IB_WR_LOCAL_INV) work-requests, with no regards to when the firmware would execute them. Arguably, waiting any amount time greater than zero for the operation to complete is better than not waiting at all. We can change the timeout to a high value, or even make it infinite by using "wait_event" instead of "wait_event_timeout". For the registration work-requests there is a benefit to wait a short amount of time only (the trade-off described in patch #1 of this series). For de-registration work-requests, it is beneficial to wait until they are truly done. But: Function "rds_ib_unreg_frmr" prior and post this change simply moves on after a failed de-registration attempt, and releases the pages owned by the memory region. This patch does _not_ change that behavior. > If the work request fails, then it will lead to flush errors and > MRs will be marked as STALE. So this wait may not be necessary > This wait is necessary to avoid the 2 scenarios described in the commit-log message: #1) Memory regions bouncing between "drop_list" and "clean_list" as items on the "clean_list" aren't really clean until their state transitions to "FRMR_IS_FREE". #2) Prevent an access error as "rds_ib_post_inv" is called just prior to de-referencing pages via "__rds_ib_teardown_mr". And you certainly don't want those pages populated in the HCA's memory-translation-table with full access, while the Linux kernel 'thinks' you gave them back already and starts re-purposing them. > RDS_GET_MR case is what actually showing the issue you saw > and the fix for that Avinash has it in production kernel. Actually, no: Socket option RDS_GET_MR wasn't even in the code-path of the tests I performed: It were there RDS_CMSG_RDMA_MAP / RDS_CMSG_RDMA_DEST control messages that ended up calling '__rds_rdma_map". > > I believe with that change, registration issue becomes non-issue > already. > Please explain how that is related to this fix-suggestion? I submitted this patch #3 and the others in this series in order to fix bugs in the RDS that is currently shipped with Linux. It may very well be the case that there are other changes that Avinash put into production kernels that would be better suited to fix this and other problems. But that should not eliminate the need to fix what is currently broken. Fixing what's broken does not preclude replacing the fixed code with newer or better versions of the same. > And as far as invalidation concerned with proxy qp, it not longer > races with data path qp. > I don't understand, please elaborate. > May be you can try those changes if not already to see if it > addresses the couple of cases where you ended up adding > timeouts. > I don't understand, please elaborate: a) Are you saying this issue should not be fixed? b) Or are you suggesting to replace this fix with a different fix? If it's the later, please point out what you have in mind. c) ??? Thanks, Gerd