[ https://issues.apache.org/jira/browse/KUDU-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625692#comment-17625692 ]
Bakai Ádám commented on KUDU-1698: ---------------------------------- I tried to recreate the exact steps in a test, but it failed, because the client didn't rediscover, but rather tried over and over again until session timeout. I talked with [~aserbin] and we came to the conclusion to create a new issue for the not rediscovering behaviour, and test the session and rpc timeout in an ohter way. The new issue is: KUDU-3414 . The new idea to test the separate entity property: * Make the tablet lookup artificially slow by adding latency. * See that the rpc is timing out but retries. * Remove the artifical delay * Check that the operation was succesful in the end, and tablet look up happened twice. > Kudu C++ client: add a new unit test to make sure default_rpc_timeout and > session timeout are separate entities > --------------------------------------------------------------------------------------------------------------- > > Key: KUDU-1698 > URL: https://issues.apache.org/jira/browse/KUDU-1698 > Project: Kudu > Issue Type: Task > Components: client, test > Reporter: Alexey Serbin > Assignee: Bakai Ádám > Priority: Minor > Labels: newbie > > We need a new unit test that makes sure there is a difference between > top-level operation timeout and per-call RPC timeout in Kudu C++ client > library. Prior to change introduced in > 5195ce573850653e0e53094cdd35a1da93d33444 it was the same (which was a bug). > The test should: > * set per-call RPC timeout when creating KuduClient object > * set KuduSession::SetTimeoutMillis() for the target session: the value > should be 2 times of per-call RPC timeout or such. > * create a tablet with replication factor of 2 at least. > * find current tablet replica leader and pause it (send SIGSTOP) > * make a write into the table > * make sure the write operation was successful > Prior to change introduced in 5195ce573850653e0e53094cdd35a1da93d33444 such a > test would fail because the C++ client used the full operation deadline on > every RPC call. > I.e., it would wait till the call to current leader times out, and that would > consume time budget of the whole operation. Once RPC timeout is less thatn > the timeout for the whole write operation, the call to the frozen tablet > server should timeout, and the client should re-discover a new tablet > replicate leader and complete the write operation successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)