On 7/1/19 1:55 PM, Gerd Rausch wrote:
Hi Santosh,
On 01/07/2019 13.41, santosh.shilim...@oracle.com wrote:
@@ -144,7 +146,29 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
if (printk_ratelimit())
pr_warn("RDS/IB: %s returned error(%d)\n",
__func__, ret);
+ goto out;
+ }
+
+ if (!frmr->fr_reg)
+ goto out;
+
+ /* Wait for the registration to complete in order to prevent an invalid
+ * access error resulting from a race between the memory region already
+ * being accessed while registration is still pending.
+ */
+ wait_event_timeout(frmr->fr_reg_done, !frmr->fr_reg,
+ msecs_to_jiffies(100));
+
This arbitrary timeout in this patch as well as pacth 1/7 which
Dave pointed out has any logic ?
It's empirical (see my response to David's question):
Memory registrations took longer than invalidations, hence 100msec instead of
10msec.
MR registration command issued to hardware can at times take as
much as command timeout(e.g 60 seconds in CX3) and upto that its still
legitimate operation and not necessary failure. We shouldn't add
arbitrary time outs in ULPs.
Where did you find the 60 seconds for CX3 you are referring to?
Is there a "generic" upper-bound that is not tied to a specific vendor / HCA?
Can you provide a pointer?
Look for command timeout in CX3 sources. 60 second is upper bound in
CX3. Its not standard in specs(at least not that I know) though
and may vary from vendor to vendor.
Regards,
Santosh