Joe McDonnell created IMPALA-13887:
--------------------------------------

             Summary: TestParquet.test_resolution_by_name fails with tuple 
caching enabled
                 Key: IMPALA-13887
                 URL: https://issues.apache.org/jira/browse/IMPALA-13887
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


When running TestParquet.test_resolution_by_name with tuple caching enabled, it 
fails with a correctness issue:
{noformat}
 TestParquet.test_resolution_by_name[protocol: beeswax | table_format: 
parquet/none | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'debug_action': 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
'exec_single_node_rows_threshold': 0}] 
[gw0] linux2 -- Python 2.7.16 
/home/joemcdonnell/upstream/Impala/bin/../infra/python/env-gcc10.4.0/bin/python
query_test/test_scanners.py:1052: in test_resolution_by_name
    use_db=unique_database)
common/impala_test_suite.py:904: in run_test_case
    self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:737: in __verify_results_and_errors
    replace_filenames_with_placeholder)
common/test_result_verifier.py:523: in verify_raw_results
    VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:305: in verify_query_result_is_equal
    assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E     'NULL' == 'NULL'
E     'NULL' == 'NULL'
E     'NULL' == 'NULL'
E     'NULL' == 'NULL'
E     'NULL' == 'NULL'
E     'NULL' != 'aaa'
E     'NULL' != 'aaa'
E     'NULL' != 'bbb'
E     'NULL' != 'bbb'
E     'NULL' != 'c'
E     'NULL' != 'c'
E     'NULL' != 'nonnullable'
{noformat}
The test alters a table to change the name of a column, which actually changes 
the meaning of the statement when using 
parquet_fallback_schema_resolution=name. The issue is that the cache key 
doesn't contain the actual column names. These are the SQLs:
{noformat}
select tmp.f from nested_resolution_by_name_test.nested_struct.c.d.item tmp;
# Renames 'f' to 'renamed'
alter table nested_resolution_by_name_test change nested_struct nested_struct
struct<b: array<int>, a: int, c: struct<d: array<array<struct<renamed: 
string>>>>>;

select tmp.renamed from nested_resolution_by_name_test.nested_struct.c.d.item 
tmp;{noformat}
The cache key should incorporate the column/field names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to