On 7/1/19 2:06 PM, Gerd Rausch wrote:
Hi Santosh,

On 01/07/2019 14.00, santosh.shilim...@oracle.com wrote:

Look for command timeout in CX3 sources. 60 second is upper bound in
CX3. Its not standard in specs(at least not that I know) though
and may vary from vendor to vendor.


I am not seeing it. Can you point me to the right place?

Below. All command timeouts are 60 seconds.

enum {
        MLX4_CMD_TIME_CLASS_A   = 60000,
        MLX4_CMD_TIME_CLASS_B   = 60000,
        MLX4_CMD_TIME_CLASS_C   = 60000,
};

But having said that, I re-looked the code you are patching
and thats actually only FRWR code which is purely work-request
based so this command timeout shouldn't matter.

If the work request fails, then it will lead to flush errors and
MRs will be marked as STALE. So this wait may not be necessary

There is a socket call RDS_GET_MR which needs to be synchronous
and that Avinash has actually fixed by making this MR registration
processes synchronous. Inline registration is still kept async.
RDS_GET_MR case is what actually showing the issue you saw
and the fix for that Avinash has it in production kernel.

I believe with that change, registration issue becomes non-issue
already.

And as far as invalidation concerned with proxy qp, it not longer
races with data path qp.

May be you can try those changes if not already to see if it
addresses the couple of cases where you ended up adding
timeouts.

Regards,
Santosh

Reply via email to