[ 
https://issues.apache.org/jira/browse/IMPALA-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923877#comment-17923877
 ] 

Quanlong Huang commented on IMPALA-13727:
-----------------------------------------

Reproduced this with more output on the profile:
{code:python}
query_test/test_scanners.py:903: in test_multiple_blocks_mt_dop
    assert ranges_per_host[host] == 2,\
E   AssertionError: ScanRangesComplete for 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27002): should be 
2 in profile:{code}

The problem is that all fragment instances except the last one in 
[^profile.txt] have an info of "(Total: 49.999ms, non-child: 0.000ns, % 
non-child: 0.00%)" after the hostname part:
{noformat}
$ grep -E 'ScanRangesComplete|host=' profile.txt
      Instance d843c27e276bfa7a:a86450f600000000 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27000):(Total:
 49.999ms, non-child: 0.000ns, % non-child: 0.00%)
         - ScanRangesComplete: 1 (1)
      Instance d843c27e276bfa7a:a86450f600000001 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27000):(Total:
 19.999ms, non-child: 0.000ns, % non-child: 0.00%)
           - ScanRangesComplete: 1 (1)
      Instance d843c27e276bfa7a:a86450f600000002 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27000):(Total:
 19.999ms, non-child: 0.000ns, % non-child: 0.00%)
           - ScanRangesComplete: 1 (1)
      Instance d843c27e276bfa7a:a86450f600000003 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27001):(Total:
 19.999ms, non-child: 0.000ns, % non-child: 0.00%)
           - ScanRangesComplete: 1 (1)
      Instance d843c27e276bfa7a:a86450f600000004 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27001):(Total:
 19.999ms, non-child: 0.000ns, % non-child: 0.00%)
           - ScanRangesComplete: 1 (1)
      Instance d843c27e276bfa7a:a86450f600000006 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27002):(Total:
 19.999ms, non-child: 0.000ns, % non-child: 0.00%)
           - ScanRangesComplete: 1 (1)
      Instance d843c27e276bfa7a:a86450f600000005 
(host=impala-ec2-rhel88-m7g-4xlarge-ondemand-1b29.vpc.cloudera.com:27002):
           - ScanRangesComplete: 1 (1){noformat}

"(Total:" is also matched by the regex:
{code:python}
host_list = re.findall(r'host=(\S+:[0-9]*)', result.runtime_profile)
{code}
https://github.com/apache/impala/blob/9b93ab8b55901fabd0db3dfca5fb5209122c0e34/tests/query_test/test_scanners.py#L885
Note that "\S" matches all non-space characters. We should change this regex 
expression to correctly match the hostname part.

> TestParquet.test_multiple_blocks_mt_dop failed by unexpected ranges_per_host
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-13727
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13727
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: profile.txt
>
>
> The test could fail in any exec vectors, e.g.
> {code}
> query_test.test_scanners.TestParquet.test_multiple_blocks_mt_dop[protocol: 
> beeswax | table_format: parquet/none | exec_option: {'test_replan': 1, 
> 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:[email protected]', 
> 'exec_single_node_rows_threshold': 0}] {code}
> Stacktrace
> {code:python}
> query_test/test_scanners.py:903: in test_multiple_blocks_mt_dop
>     assert ranges_per_host[host] == 2
> E   assert 1 == 2{code}
> Standard Error
> {noformat}
> SET 
> client_identifier=query_test/test_scanners.py::TestParquet::()::test_multiple_blocks_mt_dop[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'de;
> SET test_replan=1;
> SET mt_dop=2;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=True;
> SET abort_on_error=1;
> SET debug_action=-1:OPEN:[email protected];
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> select count(l_orderkey) from functional_parquet.lineitem_sixblocks;
> -- 2025-02-01 20:42:19,750 INFO     MainThread: Started query 
> 0347a7702c366f22:89a1448a00000000
> SET 
> client_identifier=query_test/test_scanners.py::TestParquet::()::test_multiple_blocks_mt_dop[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'de;{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to