Joe McDonnell created IMPALA-14181:
--------------------------------------
Summary: DCHECK in Sorter::Run::ConvertValueOffsetsToPtrs() with
low memory
Key: IMPALA-14181
URL: https://issues.apache.org/jira/browse/IMPALA-14181
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
When testing with tuple caching, queries start to use more memory. I hit this
assert while running test_sort.py's TestArraySort:
{noformat}
F20250623 14:59:49.492782 1002052 sorter.cc:829]
d84fbaa341de8fbb:027dfece00000000] Check failed: page_offset == 0 (2311 vs. 0)
{noformat}
>From here:
{noformat}
if (page_index > var_len_pages_index_) {
// We've reached the page boundary for the current var-len page.
// This tuple will be returned in the next call to GetNext().
DCHECK_GE(page_index, 0);
DCHECK_LE(page_index, var_len_pages_.size());
DCHECK_EQ(page_index, var_len_pages_index_ + 1);
DCHECK_EQ(page_offset, 0); <-------------- HERE
// The data is the first thing in the next
page.
// This must be the first slot with var len
data for the
// tuple. Var len data for tuple shouldn't be
split
// across blocks.
DCHECK(AllPrevSlotsAreNullsOrSmall<ValueType>(tuple, slots, idx));
return false;
}{noformat}
This is easy to reproduce by using a slightly tighter memory value. On my
machine, this works:
{noformat}
set max_sort_run_size=2;
set num_nodes=1;
-- ordinarily buffer_pool_limit=44m, but use a slightly tighter value
set buffer_pool_limit=41m;
select string_col, int_array, double_map, string_array, mixed from
functional_parquet.arrays_big order by string_col;{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]