[ 
https://issues.apache.org/jira/browse/IMPALA-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930463#comment-17930463
 ] 

ASF subversion and git services commented on IMPALA-10338:
----------------------------------------------------------

Commit d682e226d5220e828ce473f7f9a7aa8898c20f32 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d682e226d ]

IMPALA-10338: Deflake test_queue_reasons_slots in ASAN build.

test_queue_reasons_slots has been flaky in ASAN build for not completing
all of its test queries within 60 seconds from submission time. In the
latest occurrence, the 4th out of 5 queries submitted in parallel does
not get admitted after 60 seconds. It then ends in ERROR state, while
the tests expect it to ends in FINISHED state.

This patch attempt to deflake the tests by increasing slots and MT_DOP
option from 4 to 8 in test_queue_reasons_slots. The test query is a
simple GROUP BY query that is expected to run faster with increased
degree of parallelism.

Testing:
- Loop and pass the test 100x in ASAN build.

Change-Id: I4546aa3ce66c480504685e842a4b610a9a7e01ee
Reviewed-on: http://gerrit.cloudera.org:8080/22530
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> TestAdmissionController.test_queue_reasons_slots flaky because of 
> slow/hanging fragment
> ---------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10338
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10338
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: Tim Armstrong
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: broken-build, flaky, hang
>         Attachments: failure-output.txt, impalad.ERROR, impalad.INFO
>
>
> This is on an s3 debug build, commit 5a00a4c06f8ec40a8867dcbc036cf5bb47b8a3be
> {noformat}
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  (from pytest)
> Failing for the past 1 build (Since Failed#673 )
> Took 1 min 58 sec.
> add description
> Error Message
> Timeout: query 504b3a2511f3cd0e:e27bec6b00000000 did not reach one of the 
> expected states [4], last known state 3
> Stacktrace
> custom_cluster/test_admission_controller.py:967: in test_queue_reasons_slots
>     TIMEOUT_S, config_options={"mt_dop": 4})
> custom_cluster/test_admission_controller.py:277: in 
> _execute_and_collect_profiles
>     state = self.wait_for_any_state(handle, expected_states, timeout_s)
> common/impala_test_suite.py:1081: in wait_for_any_state
>     actual_state))
> E   Timeout: query 504b3a2511f3cd0e:e27bec6b00000000 did not reach one of the 
> expected states [4], last known state 3
> {noformat}
> Those numbers are beeswax QUeryStates:
> {noformat}
> enum QueryState {
>   CREATED = 0
>   INITIALIZED = 1
>   COMPILED = 2
>   RUNNING = 3
>   FINISHED = 4
>   EXCEPTION = 5
> }
> {noformat}
> I.e. it appears to have run for > 60 seconds.
> The timing of that query:
> {noformat}
> I1116 13:25:08.449323 32665 impala-server.cc:1242] 
> 504b3a2511f3cd0e:e27bec6b00000000] Registered query 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 
> session_id=874c2c9eaf2ad730:0004c1bb0ba7b4a7
> I1116 13:25:08.449626 32665 Frontend.java:1532] 
> 504b3a2511f3cd0e:e27bec6b00000000] Analyzing query: select 
> min(ss_wholesale_cost) from tpcds_parquet.store_sales db: default
> ...
> I1116 13:25:08.567667   367 admission-controller.cc:1532] 
> 504b3a2511f3cd0e:e27bec6b00000000] Scheduling query 
> 504b3a2511f3cd0e:e27bec6b00000000 with membership version 2
> I1116 13:25:08.567767   367 admission-controller.cc:1590] 
> 504b3a2511f3cd0e:e27bec6b00000000] Scheduling for executor group: 
> default-pool-group1 with 3 executors
> I1116 13:25:08.643026   367 admission-controller.cc:1640] 
> 504b3a2511f3cd0e:e27bec6b00000000] Trying to admit query to pool default-pool 
> in executor group default-pool-group1 (3 executors)
> ...
> I1116 13:25:49.184185 32432 admission-controller.cc:1811] Admitting from 
> queue: query=504b3a2511f3cd0e:e27bec6b00000000
> I1116 13:25:49.184196 32432 admission-controller.cc:1903] For Query 
> 504b3a2511f3cd0e:e27bec6b00000000 per_backend_mem_limit set to: -1.00 B 
> per_backend_m
> em_to_admit set to: 114.02 MB coord_backend_mem_limit set to: -1.00 B 
> coord_backend_mem_to_admit set to: 114.02 MB
> I1116 13:25:49.184350   367 admission-controller.cc:1288] 
> 504b3a2511f3cd0e:e27bec6b00000000] Admitted queued query 
> id=504b3a2511f3cd0e:e27bec6b00000000
> I1116 13:25:49.184370   367 admission-controller.cc:1289] 
> 504b3a2511f3cd0e:e27bec6b00000000] Final: agg_num_running=1, 
> agg_num_queued=0, agg_mem_reserved
> =17.25 KB,  local_host(local_mem_admitted=342.05 MB, num_admitted_running=1, 
> num_queued=0, backend_mem_reserved=17.25 KB, topN_query_stats: queries=[0a42
> 32658ea48bc5:c015646900000000, 5b470ebea7782154:ea22bdb000000000, 
> 554d88d8f812e22d:efbaf75200000000, aa4a301189a4d144:fdc17ff700000000], 
> total_mem_consum
> ed=17.25 KB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, 
> min=0, max=17.25 KB, pool_total_mem=17.25 KB, average_per_query=4.31 KB)
> I1116 13:25:49.185214   367 impala-server.cc:2062] 
> 504b3a2511f3cd0e:e27bec6b00000000] Registering query locations
> I1116 13:25:49.185261   367 coordinator.cc:149] 
> 504b3a2511f3cd0e:e27bec6b00000000] Exec() 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 stmt=select min(ss_w
> holesale_cost) from tpcds_parquet.store_sales
> I1116 13:25:49.186172   367 coordinator.cc:473] 
> 504b3a2511f3cd0e:e27bec6b00000000] starting execution on 3 backends for 
> query_id=504b3a2511f3cd0e:e27bec6
> b00000000
> I1116 13:25:49.189028 32071 control-service.cc:142] 
> 504b3a2511f3cd0e:e27bec6b00000000] ExecQueryFInstances(): 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-018e.vpc.cloudera.com:27000 
> #instances=5
> I1116 13:25:49.192133   367 coordinator.cc:532] 
> 504b3a2511f3cd0e:e27bec6b00000000] started execution on 3 backends for 
> query_id=504b3a2511f3cd0e:e27bec6b
> 00000000
> ...
> I1116 13:26:49.278456 32665 impala-server.cc:1320] UnregisterQuery(): 
> query_id=504b3a2511f3cd0e:e27bec6b00000000
> I1116 13:26:49.278478 32665 coordinator.cc:672] ExecState: query 
> id=504b3a2511f3cd0e:e27bec6b00000000 execution cancelled
> I1116 13:26:49.278493 32665 coordinator-backend-state.cc:921] 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 target backend=127.0.0.1:27000: 
> Sending CancelQueryFInstances rpc
> ...
> {noformat}
> As far as when the instances finished - 13:25:55
> {noformat}
> tarmstrong@tarmstrong-box:~/tmplogs8/impala-asf-master-core-s3-DEBUG-673/logs/custom_cluster_tests/relevant$
>  grep -h -R 504b3a2511f3cd0e.*'Instance' *
> I1116 13:25:49.189028 32071 control-service.cc:142] 
> 504b3a2511f3cd0e:e27bec6b00000000] ExecQueryFInstances(): 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-018e.vpc.cloudera.com:27000 
> #instances=5
> I1116 13:25:55.125044   640 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000002] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000002 #in-flight=4 status=OK
> I1116 13:25:55.136926   642 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000004] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000004 #in-flight=3 status=OK
> I1116 13:25:55.139536   639 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000001] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000001 #in-flight=2 status=OK
> I1116 13:25:55.144346   641 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000003] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000003 #in-flight=1 status=OK
> I1116 13:26:49.278493 32665 coordinator-backend-state.cc:921] 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 target backend=127.0.0.1:27000: 
> Sending CancelQueryFInstances rpc
> I1116 13:26:49.278884 32665 coordinator-backend-state.cc:921] 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 target backend=127.0.0.1:27002: 
> Sending CancelQueryFInstances rpc
> I1116 13:26:49.278980   643 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000000] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000000 #in-flight=0 status=CANCELLED: 
> Cancelled
> I1116 13:25:49.190251 32005 control-service.cc:142] 
> 504b3a2511f3cd0e:e27bec6b00000000] ExecQueryFInstances(): 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-018e.vpc.cloudera.com:27000 
> #instances=4
> I1116 13:25:55.015192   645 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000006] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000006 #in-flight=3 status=OK
> I1116 13:25:55.015666   646 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000007] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000007 #in-flight=2 status=OK
> I1116 13:25:55.026870   644 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000005] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000005 #in-flight=1 status=OK
> I1116 13:25:55.034817   647 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000008] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000008 #in-flight=0 status=OK
> I1116 13:25:49.191587 31720 control-service.cc:142] 
> 504b3a2511f3cd0e:e27bec6b00000000] ExecQueryFInstances(): 
> query_id=504b3a2511f3cd0e:e27bec6b00000000 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-018e.vpc.cloudera.com:27000 
> #instances=4
> I1116 13:25:55.641626   648 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b00000009] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b00000009 #in-flight=3 status=OK
> I1116 13:25:55.646078   649 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b0000000a] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b0000000a #in-flight=2 status=OK
> I1116 13:25:55.650281   652 query-state.cc:906] 
> 504b3a2511f3cd0e:e27bec6b0000000b] Instance completed. 
> instance_id=504b3a2511f3cd0e:e27bec6b0000000b #in-flight=1 status=OK
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to