Laszlo Gaal created IMPALA-14100:
------------------------------------

             Summary: critique-gerrit-review.py crashes with a codec exception 
when reviewing a diff containing data with non-UTF-8 encoding
                 Key: IMPALA-14100
                 URL: https://issues.apache.org/jira/browse/IMPALA-14100
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
            Reporter: Laszlo Gaal


The precommit checker script {{bin/jenkins/critique-gerrit-review.py}} can 
crash with the following Python traceback when the change diff contains data 
with an encoding different from UTF-8. This can happen when prebuilt data files 
are supplied with a patch, as it happened with 
https://gerrit.cloudera.org/c/22049/ for example.
{code}
10:34:47.030720 git.c:439               trace: built-in: git diff -U0 
HEAD^..HEAD
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
 line 491, in <module>
    merge_comments(comments, get_misc_comments(base_revision, revision, 
args.dryrun))
  File 
"/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
 line 209, in get_misc_comments
    diff = check_output(["git", "diff", "-U0", "{0}..{1}".format(base_revision, 
revision)],
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 495, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib/python3.8/subprocess.py", line 1015, in communicate
    stdout = self.stdout.read()
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 34006: 
invalid start byte
{code}
Excluding the problematic file(s) in 
https://github.com/apache/impala/blob/f4e75510948bdb72f2d5206161fee12e5b6d0888/bin/jenkins/critique-gerrit-review.py#L68-L77
 does not help, as the crash happens when processing the output of {{git 
diff}}, which returnsa single output stream containing all the changes in all 
the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to