Joe McDonnell created IMPALA-13894: -------------------------------------- Summary: Tuple cache correctness verification should proceed past file size differences Key: IMPALA-13894 URL: https://issues.apache.org/jira/browse/IMPALA-13894 Project: IMPALA Issue Type: Task Components: Backend Affects Versions: Impala 5.0.0 Reporter: Joe McDonnell
Tuple cache correctness verification does a fast check to see if the two files are identical. If it determines that they are not identical, then it can proceed to a slow check that corrects for order differences. This fast check looks at the file sizes and if they are not the same, it returns a not-OK status: {noformat} if (file1_length != file2_length || file1_length == TUPLE_TEXT_FILE_SIZE_ERROR) { return Status(TErrorCode::TUPLE_CACHE_INCONSISTENCY, Substitute("Size of file '$0' (size: $1) and '$2' (size: $3) are different", path_a + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file1_length, path_b + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file2_length)); }{noformat} Returning not-OK status actually causes the calling code to skip the slow check that can give more detail about what is different. We should change this to set *passed = false and let the slower check go forward so that it produces a more interesting error message. It's also unclear whether the same rows in a different order would always have the same size. -- This message was sent by Atlassian Jira (v8.20.10#820010)