[
https://issues.apache.org/jira/browse/IMPALA-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986595#comment-17986595
]
ASF subversion and git services commented on IMPALA-13887:
----------------------------------------------------------
Commit 7b25a7b070ba69e3b153b9f2b40748ee7681d84f in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7b25a7b07 ]
IMPALA-13887: Incorporate column/field information into cache key
The correctness verification for the tuple cache found an issue
with TestParquet::test_resolution_by_name(). The test creates a
table, selects, alters the table to change a column name, and
selects again. With parquet_fallback_schema_resolution=NAME, the
column names determine behavior. The tuple cache key did not
include the column names, so it was producing an incorrect result
after changing the column name.
This change adds information about the column / field name to the
TSlotDescriptor so that it is incorporated into the tuple cache key.
This is only needed when producing the tuple cache key, so it is
omitted for other cases.
Testing:
- Ran TestParquet::test_resolution_by_name() with correctness
verification
- Added custom cluster test that runs the test_resolution_by_name()
test case with tuple caching. This fails without this change.
Change-Id: Iebfa777452daf66851b86383651d35e1b0a5f262
Reviewed-on: http://gerrit.cloudera.org:8080/23073
Reviewed-by: Yida Wu <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> TestParquet.test_resolution_by_name fails with tuple caching enabled
> --------------------------------------------------------------------
>
> Key: IMPALA-13887
> URL: https://issues.apache.org/jira/browse/IMPALA-13887
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Critical
>
> When running TestParquet.test_resolution_by_name with tuple caching enabled,
> it fails with a correctness issue:
> {noformat}
> TestParquet.test_resolution_by_name[protocol: beeswax | table_format:
> parquet/none | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes':
> 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True,
> 'abort_on_error': 1, 'debug_action':
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:[email protected]',
> 'exec_single_node_rows_threshold': 0}]
> [gw0] linux2 -- Python 2.7.16
> /home/joemcdonnell/upstream/Impala/bin/../infra/python/env-gcc10.4.0/bin/python
> query_test/test_scanners.py:1052: in test_resolution_by_name
> use_db=unique_database)
> common/impala_test_suite.py:904: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:737: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:523: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:305: in verify_query_result_is_equal
> assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' == 'NULL'
> E 'NULL' != 'aaa'
> E 'NULL' != 'aaa'
> E 'NULL' != 'bbb'
> E 'NULL' != 'bbb'
> E 'NULL' != 'c'
> E 'NULL' != 'c'
> E 'NULL' != 'nonnullable'
> {noformat}
> The test alters a table to change the name of a column, which actually
> changes the meaning of the statement when using
> parquet_fallback_schema_resolution=name. The issue is that the cache key
> doesn't contain the actual column names. These are the SQLs:
> {noformat}
> select tmp.f from nested_resolution_by_name_test.nested_struct.c.d.item tmp;
> # Renames 'f' to 'renamed'
> alter table nested_resolution_by_name_test change nested_struct nested_struct
> struct<b: array<int>, a: int, c: struct<d: array<array<struct<renamed:
> string>>>>>;
> select tmp.renamed from nested_resolution_by_name_test.nested_struct.c.d.item
> tmp;{noformat}
> The cache key should incorporate the column/field names.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]