mailer.py cannot handle utf-8 path in Subject correctly (Re: mailer.py can produce subject header violates RFC 5321/5322 if truncate_subject is not set)

Yasuhito FUTATSUKI Tue, 07 Jan 2020 07:29:00 -0800

On 2020/01/07 9:41, Yasuhito FUTATSUKI wrote:
> On 2020/01/07 6:52, Yasuhito FUTATSUKI wrote:
>> By the way, it seems another issue about truncate_subject that current
>> implementation of truncate_subject may break utf-8 multi-bytes character
>> sequence, but I didn't reproduce it(because I always use ascii
>> characters only for file names...).


I could reproduce this problem.

with shell script:
[[[
#!/bin/sh

# LC_CTYPE should be valid utf-8 locale. Please change if below is not
# appropriate
export LC_CTYPE=en_US.UTF-8

# assuming 'svnadmin', 'sed', 'chmod', 'cp', 'mkdir', 'python2', and 'cat'
# in command search path, Python bindings installed correctly,
# and subversion_wc pointing appropriate checkout path
subversion_wc='/path/to/subversion/trunk/working/copy'

# set up new repo for mailer.py testing
svnadmin create newrepo
cp ${subversion_wc}/tools/hook-scripts/mailer/mailer.py newrepo/hooks
sed -e 's/^#truncate_subject = 200/truncate_subject = 78/' \
  -e 's/^#mail_command.*/mail_command = cat -/' \
  ${subversion_wc}/tools/hook-scripts/mailer/mailer.conf.example \
  > newrepo/hooks/mailer.conf
sed -e 's/^\(mailer\.py.*\)\/path\/to\/\(mailer\.conf\)/env python2 
"\$REPOS"\/hooks\/\1 "\$REPOS"\/hooks\/\2/' \
  newrepo/hooks/post-commit.tmpl > newrepo/hooks/post-commit
chmod +x newrepo/hooks/post-commit

svn checkout file:///`pwd`/newrepo wd && cd wd
svn mkdir '〇〇〇一' '〇〇〇二' '〇〇〇三' '〇〇〇四' '〇〇〇五' '〇〇〇六'
svn commit -m 'test for mailer.py'
]]]

the result is...
[[[
Checked out revision 0.
A         〇〇〇一
A         〇〇〇二
A         〇〇〇三
A         〇〇〇四
A         〇〇〇五
A         〇〇〇六
Adding         〇〇〇一
Adding         〇〇〇三
Adding         〇〇〇二
Adding         〇〇〇五
Adding         〇〇〇六
Adding         〇〇〇四
Committing transaction...
Committed revision 1.

Warning: post-commit hook failed (exit code 1) with output:
Traceback (most recent call last):
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
1534, in <module>
    sys.argv[3:3+expected_args])
  File "/usr/local/lib/python2.7/site-packages/svn/core.py", line 310, in 
run_app
    return func(application_pool, *args, **kw)
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
126, in main
    return messenger.generate()
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
489, in generate
    self.output.start(group, params)
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
394, in start
    self.write(self.mail_headers(group, params))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
251, in mail_headers
    subject  = self._rfc2047_encode(self.make_subject(group, params))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
246, in _rfc2047_encode
    return ' '.join(map(_maybe_encode_header, hdr.split()))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 
244, in _maybe_encode_header
    return Header(hdr_token, 'utf-8').encode()
  File "/usr/local/lib/python2.7/email/header.py", line 183, in __init__
    self.append(s, charset, errors)
  File "/usr/local/lib/python2.7/email/header.py", line 267, in append
    ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-4: invalid 
continuation byte
cat: -f: No such file or directory
cat: inva...@example.com: No such file or directory
cat: inva...@example.com: No such file or directory

]]]
(Lines starts with 'cat:' is expected out put of
 "cat - -f inva...@example.com inva...@example.com")
 
> Probably it needs something like this (but it doesn't support conbining
> characters, and I didn't any test...):
> [[[
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1872398)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -159,7 +159,13 @@
>        truncate_subject = 0
>  
>      if truncate_subject and len(subject) > truncate_subject:
> -      subject = subject[:(truncate_subject - 3)] + "..."
> +      # To avoid breaking utf-8 multi-bytes character sequence, we should
> +      # search the top of the sequence if the byte of the truncate point is
> +      # secound or later part of multi-bytes character sequence. 
> +      idx = truncate_subject - 3
> +      while  0x80 <= ord(subject[idx]) <= 0xbf:
> +        idx -= 1
> +      subject = subject[:idx] + "..."
>      return subject
>  
>    def start(self, group, params):
> ]]]

After this patch applied, the script above runs without error. 

However, this produces Subject line below.

[[[
Subject: r1 - =?utf-8?b?44CH44CH44CH5LiA?= =?utf-8?b?44CH44CH44CH5LiJ?= 
=?utf-8?b?44CH44CH44CH5LqM?= =?utf-8?b?44CH44CH44CH5LqU?= 
=?utf-8?b?44CH44CH44CH5YWt?= =?utf-8?b?44CHLi4u?=^M
]]]

and decoded Results is

"Subject: r1 - 〇〇〇一〇〇〇三〇〇〇二〇〇〇五〇〇〇六〇..."

because white space(s) between encoded words are ignored.
I think this is not what we want.

Cheers,
-- 
Yasuhito FUTATSUKI <futat...@yf.bsdclub.org> / <futat...@poem.co.jp>

mailer.py cannot handle utf-8 path in Subject correctly (Re: mailer.py can produce subject header violates RFC 5321/5322 if truncate_subject is not set)

Reply via email to