[ 
https://issues.apache.org/jira/browse/IMPALA-14100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954328#comment-17954328
 ] 

Laszlo Gaal commented on IMPALA-14100:
--------------------------------------

FYI [~mszjat]

> critique-gerrit-review.py crashes with a codec exception when reviewing a 
> diff containing data with non-UTF-8 encoding
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14100
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14100
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>            Reporter: Laszlo Gaal
>            Priority: Major
>
> The precommit checker script {{bin/jenkins/critique-gerrit-review.py}} can 
> crash with the following Python traceback when the change diff contains data 
> with an encoding different from UTF-8. This can happen when prebuilt data 
> files are supplied with a patch, as it happened with 
> https://gerrit.cloudera.org/c/22049/ for example.
> {code}
> 10:34:47.030720 git.c:439               trace: built-in: git diff -U0 
> HEAD^..HEAD
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
>  line 491, in <module>
>     merge_comments(comments, get_misc_comments(base_revision, revision, 
> args.dryrun))
>   File 
> "/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
>  line 209, in get_misc_comments
>     diff = check_output(["git", "diff", "-U0", 
> "{0}..{1}".format(base_revision, revision)],
>   File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
>     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>   File "/usr/lib/python3.8/subprocess.py", line 495, in run
>     stdout, stderr = process.communicate(input, timeout=timeout)
>   File "/usr/lib/python3.8/subprocess.py", line 1015, in communicate
>     stdout = self.stdout.read()
>   File "/usr/lib/python3.8/codecs.py", line 322, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 34006: 
> invalid start byte
> {code}
> Excluding the problematic file(s) in 
> https://github.com/apache/impala/blob/f4e75510948bdb72f2d5206161fee12e5b6d0888/bin/jenkins/critique-gerrit-review.py#L68-L77
>  does not help, as the crash happens when processing the output of {{git 
> diff}}, which returnsa single output stream containing all the changes in all 
> the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to