[ 
https://issues.apache.org/jira/browse/IMPALA-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042827#comment-18042827
 ] 

Jason Fehr commented on IMPALA-14602:
-------------------------------------

This test failure is extremely rare considering this test has been run in ASAN 
builds for 3 months.  The "test_cancel_during_admission_control" test works by 
starting a long running query which takes the only available admission control 
execution slot followed by starting a second query that will fail since there 
are no admission control slots available.  Since the test failed with "expected 
child spans count: 5, actual: 6", that indicates the second query that should 
have been blocked at admission control actually did not get blocked.  If the 
query had been blocked, then only 5 spans would have been present in the trace. 
 Instead, the 6th QueryExecution span was present indicating the second query 
actually executed before the first query started.

The issue seems to be because the test does not wait for the first long-running 
query to start.  It async runs the first query followed by immediately starting 
the second query.

> test_cancel_during_admission_control is flaky
> ---------------------------------------------
>
>                 Key: IMPALA-14602
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14602
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: Impala 5.0.0
>            Reporter: Riza Suminto
>            Assignee: Jason Fehr
>            Priority: Major
>              Labels: broken-build
>
> custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::test_cancel_during_admission_control
>  is flaky in downstream build with following error.
> {code:java}
> Error Message
> AssertionError: Trace '4981634b898085aa804e80969f0e66c7' expected child spans 
> count: 5, actual: 6
> Stacktrace
> custom_cluster/test_otel_trace.py:511: in test_cancel_during_admission_control
>     self.assert_trace(
> custom_cluster/test_otel_trace.py:53: in assert_trace
>     assert_trace(self.build_log_path("impalad", "INFO"), self.trace_file_path,
> util/otel_trace.py:382: in assert_trace
>     __assert_trace_common(trace, expected_span_count)
> util/otel_trace.py:485: in __assert_trace_common
>     assert len(trace.child_spans) == expected_child_spans_count, \
> E   AssertionError: Trace '4981634b898085aa804e80969f0e66c7' expected child 
> spans count: 5, actual: 6
> Standard Error
> -- 2025-12-02 16:13:37,819 INFO     MainThread: Created temporary dir 
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impala_test_out_dir_traces_jkumhes1
> -- 2025-12-02 16:13:37,819 INFO     MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=1 --num_coordinators=1 
> --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--logbuflevel=-1 -v=2 
> --cluster_id=cancel_adm_ctl --default_pool_max_requests=1 
> --otel_trace_enabled=true --otel_trace_exporter=file 
> --otel_trace_exhaustive_dchecks --otel_file_flush_interval_ms=500 
> --otel_file_pattern=/data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impala_test_out_dir_traces_jkumhes1/export-trace.jsonl
>  ' '--state_store_args=--logbuflevel=-1  ' '--catalogd_args=--logbuflevel=-1  
> ' '--admissiond_args=--logbuflevel=-1  ' 
> --impalad_args=--default_query_options=
> 16:13:38 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 16:13:38 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 16:13:39 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 16:13:40 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 16:13:42 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> 16:13:42 MainThread: Waiting for Impalad webserver port 25000
> 16:13:43 MainThread: Waiting for Impalad webserver port 25000
> 16:13:43 MainThread: Waiting for Impalad webserver port 25000
> 16:13:44 MainThread: Waiting for Impalad webserver port 25000
> 16:13:46 MainThread: Waiting for coordinator client services - hs2 port: 
> 21050 hs2-http port: 28000 beeswax port: 21000
> 16:13:46 MainThread: Getting num_known_live_backends from 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:25000
> 16:13:46 MainThread: num_known_live_backends has reached value: 1
> 16:13:46 MainThread: Total wait: 3.65s
> 16:13:46 MainThread: Impala Cluster Running with 1 nodes (1 coordinators, 1 
> executors).
> -- 2025-12-02 16:13:46,212 DEBUG    MainThread: Found 1 impalad/1 
> statestored/1 catalogd process(es)
> -- 2025-12-02 16:13:46,212 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:25010
> -- 2025-12-02 16:13:46,218 INFO     MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 2. total_wait: 0s
> -- 2025-12-02 16:13:46,218 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:25000
> -- 2025-12-02 16:13:46,222 INFO     MainThread: num_known_live_backends has 
> reached value: 1
> -- 2025-12-02 16:13:46,223 INFO     MainThread: beeswax: 
> set 
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,223 INFO     MainThread: beeswax: connected to 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21000 with 
> beeswax
> -- 2025-12-02 16:13:46,223 INFO     MainThread: hs2: 
> set 
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,223 INFO     MainThread: hs2: connected to 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050 with 
> impyla hs2
> -- 2025-12-02 16:13:46,223 INFO     MainThread: hs2-http: 
> set 
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,225 INFO     MainThread: hs2-http: connected to 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:28000 with 
> impyla hs2-http
> -- 2025-12-02 16:13:46,225 INFO     MainThread: hs2-feng: 
> set 
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,225 INFO     MainThread: hs2-feng: connected to 
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:11050 with 
> impyla hs2-feng
> -- 2025-12-02 16:13:46,230 INFO     MainThread: hs2: executing against Impala 
> at impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050. 
> session: bd44d4109d8f25f7:1150444731f78e98 main_cursor: False user: None
> SELECT * FROM functional.alltypes WHERE id = SLEEP(5000);
> -- 2025-12-02 16:13:47,578 INFO     MainThread: hs2: executing against Impala 
> at impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050. 
> session: 2b4fa7e01448a74b:3d59c1baf1f27a87 main_cursor: False user: None
> SELECT * FROM functional.alltypes;
> -- 2025-12-02 16:13:47,608 INFO     MainThread: 
> cc4a90f7c66ca08f:4b6e0abb00000000: getting state
> -- 2025-12-02 16:13:47,609 INFO     MainThread: hs2: executing against Impala 
> at impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050. 
> session: 424e33f2c059b306:d5918c0e6270d6b4 main_cursor: True user: None
> KILL QUERY 'cc4a90f7c66ca08f:4b6e0abb00000000';
> -- 2025-12-02 16:13:47,624 INFO     MainThread: 
> d04417f3822cfb69:986d6a2c00000000: query started
> -- 2025-12-02 16:13:47,626 INFO     MainThread: 
> d04417f3822cfb69:986d6a2c00000000: getting log for operation
> -- 2025-12-02 16:13:47,627 INFO     MainThread: 
> d04417f3822cfb69:986d6a2c00000000: getting runtime profile operation
> -- 2025-12-02 16:13:47,627 INFO     MainThread: 
> d04417f3822cfb69:986d6a2c00000000: closing query for operation
> -- 2025-12-02 16:13:47,628 INFO     MainThread: 
> cc4a90f7c66ca08f:4b6e0abb00000000: getting state
> -- 2025-12-02 16:13:47,628 INFO     MainThread: 
> cc4a90f7c66ca08f:4b6e0abb00000000: getting runtime profile operation
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to