[
https://issues.apache.org/jira/browse/IMPALA-14100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954328#comment-17954328
]
Laszlo Gaal commented on IMPALA-14100:
--------------------------------------
FYI [~mszjat]
> critique-gerrit-review.py crashes with a codec exception when reviewing a
> diff containing data with non-UTF-8 encoding
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-14100
> URL: https://issues.apache.org/jira/browse/IMPALA-14100
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Reporter: Laszlo Gaal
> Priority: Major
>
> The precommit checker script {{bin/jenkins/critique-gerrit-review.py}} can
> crash with the following Python traceback when the change diff contains data
> with an encoding different from UTF-8. This can happen when prebuilt data
> files are supplied with a patch, as it happened with
> https://gerrit.cloudera.org/c/22049/ for example.
> {code}
> 10:34:47.030720 git.c:439 trace: built-in: git diff -U0
> HEAD^..HEAD
> Traceback (most recent call last):
> File
> "/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
> line 491, in <module>
> merge_comments(comments, get_misc_comments(base_revision, revision,
> args.dryrun))
> File
> "/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
> line 209, in get_misc_comments
> diff = check_output(["git", "diff", "-U0",
> "{0}..{1}".format(base_revision, revision)],
> File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> File "/usr/lib/python3.8/subprocess.py", line 495, in run
> stdout, stderr = process.communicate(input, timeout=timeout)
> File "/usr/lib/python3.8/subprocess.py", line 1015, in communicate
> stdout = self.stdout.read()
> File "/usr/lib/python3.8/codecs.py", line 322, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 34006:
> invalid start byte
> {code}
> Excluding the problematic file(s) in
> https://github.com/apache/impala/blob/f4e75510948bdb72f2d5206161fee12e5b6d0888/bin/jenkins/critique-gerrit-review.py#L68-L77
> does not help, as the crash happens when processing the output of {{git
> diff}}, which returnsa single output stream containing all the changes in all
> the files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]