[ 
https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079634#comment-18079634
 ] 

Quanlong Huang commented on IMPALA-12679:
-----------------------------------------

The console output of the test shows why the query needs that many RPCs:
{noformat}
select sleep(100) from functional.alltypes limit 25;

-- 2026-05-09 00:18:02,092 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting state
-- 2026-05-09 00:18:02,147 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting state
-- 2026-05-09 00:18:02,202 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting state
-- 2026-05-09 00:18:02,264 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting state
-- 2026-05-09 00:18:02,319 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting state
-- 2026-05-09 00:18:04,820 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: fetching all rows
-- 2026-05-09 00:18:07,348 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting log for operation
-- 2026-05-09 00:18:07,349 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting runtime profile operation
-- 2026-05-09 00:18:07,351 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: getting runtime profile operation
-- 2026-05-09 00:18:07,353 INFO     MainThread: 
9c42e3a09891333f:ff443f7a00000000: closing query for operation{noformat}
"getting state" is logged in [client.wait_for_impala_state(handle, FINISHED, 
30)|https://github.com/apache/impala/blob/9384d46e4ed0344f21e8832069e16e0ee1fefcf6/tests/query_test/test_fetch.py#L53].
 The query runs in parallel e2e tests so it could take more time (>100ms) in 
admission, which is shown in the profile:
{noformat}
    Query Timeline: 5s324ms
       - Query submitted: 41.077us (41.077us)
       - Planning finished: 58.851ms (58.809ms)
       - Submit for admission: 75.824ms (16.973ms)
       - Completed admission: 215.420ms (139.595ms)
       - Ready to start on 1 backends: 215.514ms (94.237us)
       - All 1 execution backends (1 fragment instances) started: 220.416ms 
(4.902ms)
       - Rows available: 241.462ms (21.045ms)
       - First row fetched: 5s316ms (5s075ms)
       - Last row fetched: 5s316ms (21.478us)
       - Released admission control resources: 5s319ms (2.451ms)
       - Unregister query: 5s324ms (5.717ms){noformat}
In my local run when the cluster is idle, admission takes < 1ms:
{noformat}
    Query Timeline: 5s522ms
       - Query submitted: 34.127us (34.127us)
       - Planning finished: 456.226ms (456.192ms)
       - Submit for admission: 456.533ms (306.318us)
       - Completed admission: 457.028ms (495.508us)
       - Ready to start on 1 backends: 457.106ms (78.049us)
       - All 1 execution backends (1 fragment instances) started: 467.273ms 
(10.166ms)
       - Rows available: 475.383ms (8.110ms)
       - First row fetched: 5s511ms (5s036ms)
       - Last row fetched: 5s512ms (78.754us)
       - Released admission control resources: 5s513ms (1.238ms)
       - Unregister query: 5s522ms (8.914ms){noformat}
So in the failed run, it takes more RPCs to check the state, thus having a 
larger rpc_count. To deflake the test, we can count the number of state checks 
plus the additional RPCs that fetch the results and profile. Use that as the 
expected rpc_count. There is also a duplicate getting runtime profile operation 
that we can avoid.

Uploaded a patch for review: [https://gerrit.cloudera.org:8080/24288]

> test_rows_sent_counters failed to match RPCCount
> ------------------------------------------------
>
>                 Key: IMPALA-12679
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12679
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Michael Smith
>            Assignee: Kurt Deschler
>            Priority: Major
>         Attachments: test_rows_sent_counters.profile.txt
>
>
> {code}
> query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> {code}
> failed with
> {code}
> query_test/test_fetch.py:69: in test_rows_sent_counters
>     assert re.search("RPCCount: [5-9]", runtime_profile)
> E   assert None
> E    +  where None = <function search at 0x7fbec6fa7578>('RPCCount: [5-9]', 
> 'Query (id=c8476e5c065757bf:b436769800000000):\n  DEBUG MODE WARNING: Query 
> profile created while running a DEBUG buil...: 0.000ns\n             - 
> WriteIoBytes: 0\n             - WriteIoOps: 0 (0)\n             - 
> WriteIoWaitTime: 0.000ns\n')
> E    +    where <function search at 0x7fbec6fa7578> = re.search
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to