Joe McDonnell created IMPALA-13887: -------------------------------------- Summary: TestParquet.test_resolution_by_name fails with tuple caching enabled Key: IMPALA-13887 URL: https://issues.apache.org/jira/browse/IMPALA-13887 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 5.0.0 Reporter: Joe McDonnell
When running TestParquet.test_resolution_by_name with tuple caching enabled, it fails with a correctness issue: {noformat} TestParquet.test_resolution_by_name[protocol: beeswax | table_format: parquet/none | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 'exec_single_node_rows_threshold': 0}] [gw0] linux2 -- Python 2.7.16 /home/joemcdonnell/upstream/Impala/bin/../infra/python/env-gcc10.4.0/bin/python query_test/test_scanners.py:1052: in test_resolution_by_name use_db=unique_database) common/impala_test_suite.py:904: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:737: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:523: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:305: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E 'NULL' == 'NULL' E 'NULL' == 'NULL' E 'NULL' == 'NULL' E 'NULL' == 'NULL' E 'NULL' == 'NULL' E 'NULL' != 'aaa' E 'NULL' != 'aaa' E 'NULL' != 'bbb' E 'NULL' != 'bbb' E 'NULL' != 'c' E 'NULL' != 'c' E 'NULL' != 'nonnullable' {noformat} The test alters a table to change the name of a column, which actually changes the meaning of the statement when using parquet_fallback_schema_resolution=name. The issue is that the cache key doesn't contain the actual column names. These are the SQLs: {noformat} select tmp.f from nested_resolution_by_name_test.nested_struct.c.d.item tmp; # Renames 'f' to 'renamed' alter table nested_resolution_by_name_test change nested_struct nested_struct struct<b: array<int>, a: int, c: struct<d: array<array<struct<renamed: string>>>>>; select tmp.renamed from nested_resolution_by_name_test.nested_struct.c.d.item tmp;{noformat} The cache key should incorporate the column/field names. -- This message was sent by Atlassian Jira (v8.20.10#820010)