Hi Santosh, On 01/07/2019 13.41, santosh.shilim...@oracle.com wrote: >> @@ -144,7 +146,29 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr) >> if (printk_ratelimit()) >> pr_warn("RDS/IB: %s returned error(%d)\n", >> __func__, ret); >> + goto out; >> + } >> + >> + if (!frmr->fr_reg) >> + goto out; >> + >> + /* Wait for the registration to complete in order to prevent an invalid >> + * access error resulting from a race between the memory region already >> + * being accessed while registration is still pending. >> + */ >> + wait_event_timeout(frmr->fr_reg_done, !frmr->fr_reg, >> + msecs_to_jiffies(100)); >> + > This arbitrary timeout in this patch as well as pacth 1/7 which > Dave pointed out has any logic ? >
It's empirical (see my response to David's question): Memory registrations took longer than invalidations, hence 100msec instead of 10msec. > MR registration command issued to hardware can at times take as > much as command timeout(e.g 60 seconds in CX3) and upto that its still > legitimate operation and not necessary failure. We shouldn't add > arbitrary time outs in ULPs. Where did you find the 60 seconds for CX3 you are referring to? Is there a "generic" upper-bound that is not tied to a specific vendor / HCA? Can you provide a pointer? Thanks, Gerd