Joe McDonnell created IMPALA-13894:
--------------------------------------

             Summary: Tuple cache correctness verification should proceed past 
file size differences
                 Key: IMPALA-13894
                 URL: https://issues.apache.org/jira/browse/IMPALA-13894
             Project: IMPALA
          Issue Type: Task
          Components: Backend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


Tuple cache correctness verification does a fast check to see if the two files 
are identical. If it determines that they are not identical, then it can 
proceed to a slow check that corrects for order differences.

This fast check looks at the file sizes and if they are not the same, it 
returns a not-OK status:
{noformat}
  if (file1_length != file2_length || file1_length == 
TUPLE_TEXT_FILE_SIZE_ERROR) {
    return Status(TErrorCode::TUPLE_CACHE_INCONSISTENCY,
        Substitute("Size of file '$0' (size: $1) and '$2' (size: $3) are 
different",
            path_a + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file1_length,
            path_b + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file2_length));
  }{noformat}
Returning not-OK status actually causes the calling code to skip the slow check 
that can give more detail about what is different. We should change this to set 
*passed = false and let the slower check go forward so that it produces a more 
interesting error message. It's also unclear whether the same rows in a 
different order would always have the same size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to