[
https://issues.apache.org/jira/browse/IMPALA-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on IMPALA-13887 started by Joe McDonnell.
----------------------------------------------
> TestParquet.test_resolution_by_name fails with tuple caching enabled
> --------------------------------------------------------------------
>
> Key: IMPALA-13887
> URL: https://issues.apache.org/jira/browse/IMPALA-13887
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Critical
>
> When running TestParquet.test_resolution_by_name with tuple caching enabled,
> it fails with a correctness issue:
> {noformat}
> TestParquet.test_resolution_by_name[protocol: beeswax | table_format:
> parquet/none | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes':
> 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True,
> 'abort_on_error': 1, 'debug_action':
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:[email protected]',
> 'exec_single_node_rows_threshold': 0}]
> [gw0] linux2 -- Python 2.7.16
> /home/joemcdonnell/upstream/Impala/bin/../infra/python/env-gcc10.4.0/bin/python
> query_test/test_scanners.py:1052: in test_resolution_by_name
> use_db=unique_database)
> common/impala_test_suite.py:904: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:737: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:523: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:305: in verify_query_result_is_equal
> assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' != 'aaa'
> E 'NULL' != 'aaa'
> E 'NULL' != 'bbb'
> E 'NULL' != 'bbb'
> E 'NULL' != 'c'
> E 'NULL' != 'c'
> E 'NULL' != 'nonnullable'
> {noformat}
> The test alters a table to change the name of a column, which actually
> changes the meaning of the statement when using
> parquet_fallback_schema_resolution=name. The issue is that the cache key
> doesn't contain the actual column names. These are the SQLs:
> {noformat}
> select tmp.f from nested_resolution_by_name_test.nested_struct.c.d.item tmp;
> # Renames 'f' to 'renamed'
> alter table nested_resolution_by_name_test change nested_struct nested_struct
> struct<b: array<int>, a: int, c: struct<d: array<array<struct<renamed:
> string>>>>>;
> select tmp.renamed from nested_resolution_by_name_test.nested_struct.c.d.item
> tmp;{noformat}
> The cache key should incorporate the column/field names.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]