[ 
https://issues.apache.org/jira/browse/IMPALA-14164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985734#comment-17985734
 ] 

Joe McDonnell commented on IMPALA-14164:
----------------------------------------

This reproduces locally with a release build. It does not reproduce with a 
debug build. It looks like the query fragment starts and completes while the 
test is sleeping for 1 second between samples of 
impala-server.num-fragments-in-flight. Since this is timing related, there are 
several options:
 # Use a shorter interval between samples of metrics (maybe default to 200ms?)
 # Make the query take longer
 # Only wait for impala-server.num-fragments-in-flight=1 rather than 2

> TestScratchDir tests do not reach expect number of fragments in flight
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-14164
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14164
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Priority: Blocker
>              Labels: broken-build
>
> Some TestScratchDir tests are failing with these symptoms:
>  
> {noformat}
> custom_cluster/test_scratch_disk.py:238: in test_scratch_dirs_default_priority
>     verifier.wait_for_metric("impala-server.num-fragments-in-flight", 2)
> verifiers/metric_verifier.py:67: in wait_for_metric
>     self.impalad_service.wait_for_metric_value(metric_name, expected_value, 
> timeout)
> common/impala_service.py:158: in wait_for_metric_value
>     self.__metric_timeout_assert(metric_name, expected_value, timeout, value)
> common/impala_service.py:227: in __metric_timeout_assert
>     assert 0, assert_string
> E   AssertionError: Metric impala-server.num-fragments-in-flight did not 
> reach value 2 in 60s. Actual value was '1'.{noformat}
> The logs show it never reaches 2:
>  
>  
> {noformat}
> -- 2025-06-15 13:10:20,680 INFO     MainThread: Getting metric: 
> impala-server.num-fragments-in-flight from 
> impala-ec2-redhat86-m6i-4xlarge-ondemand-1c73.vpc.cloudera.com:25000
> -- 2025-06-15 13:10:20,692 INFO     MainThread: Waiting for metric value 
> 'impala-server.num-fragments-in-flight'=2. Current value: 0. total_wait: 0s
> -- 2025-06-15 13:10:20,692 INFO     MainThread: Sleeping 1s before next retry.
> -- 2025-06-15 13:10:21,693 INFO     MainThread: Getting metric: 
> impala-server.num-fragments-in-flight from 
> impala-ec2-redhat86-m6i-4xlarge-ondemand-1c73.vpc.cloudera.com:25000
> -- 2025-06-15 13:10:21,704 INFO     MainThread: Waiting for metric value 
> 'impala-server.num-fragments-in-flight'=2. Current value: 1. total_wait: 
> 1.01228308678s
> -- 2025-06-15 13:10:21,704 INFO     MainThread: Sleeping 1s before next retry.
> ...
> -- 2025-06-15 13:11:20,955 INFO     MainThread: Metric 
> impala-server.num-fragments-in-flight did not reach value 2 in 60s. Actual 
> value was '1'. total_wait: 60.2740471363s. Failing...{noformat}
> This impacts these tests:
>  
>  
> {noformat}
> TestScratchDir.test_scratch_dirs_default_priority
> TestScratchDir.test_scratch_dirs_prioritized_spill
> TestScratchDir.test_scratch_dirs_mix_local_and_remote_dir_spill_local_only{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to