On 2020/01/07 9:41, Yasuhito FUTATSUKI wrote: > On 2020/01/07 6:52, Yasuhito FUTATSUKI wrote: >> By the way, it seems another issue about truncate_subject that current >> implementation of truncate_subject may break utf-8 multi-bytes character >> sequence, but I didn't reproduce it(because I always use ascii >> characters only for file names...).
I could reproduce this problem. with shell script: [[[ #!/bin/sh # LC_CTYPE should be valid utf-8 locale. Please change if below is not # appropriate export LC_CTYPE=en_US.UTF-8 # assuming 'svnadmin', 'sed', 'chmod', 'cp', 'mkdir', 'python2', and 'cat' # in command search path, Python bindings installed correctly, # and subversion_wc pointing appropriate checkout path subversion_wc='/path/to/subversion/trunk/working/copy' # set up new repo for mailer.py testing svnadmin create newrepo cp ${subversion_wc}/tools/hook-scripts/mailer/mailer.py newrepo/hooks sed -e 's/^#truncate_subject = 200/truncate_subject = 78/' \ -e 's/^#mail_command.*/mail_command = cat -/' \ ${subversion_wc}/tools/hook-scripts/mailer/mailer.conf.example \ > newrepo/hooks/mailer.conf sed -e 's/^\(mailer\.py.*\)\/path\/to\/\(mailer\.conf\)/env python2 "\$REPOS"\/hooks\/\1 "\$REPOS"\/hooks\/\2/' \ newrepo/hooks/post-commit.tmpl > newrepo/hooks/post-commit chmod +x newrepo/hooks/post-commit svn checkout file:///`pwd`/newrepo wd && cd wd svn mkdir '〇〇〇一' '〇〇〇二' '〇〇〇三' '〇〇〇四' '〇〇〇五' '〇〇〇六' svn commit -m 'test for mailer.py' ]]] the result is... [[[ Checked out revision 0. A 〇〇〇一 A 〇〇〇二 A 〇〇〇三 A 〇〇〇四 A 〇〇〇五 A 〇〇〇六 Adding 〇〇〇一 Adding 〇〇〇三 Adding 〇〇〇二 Adding 〇〇〇五 Adding 〇〇〇六 Adding 〇〇〇四 Committing transaction... Committed revision 1. Warning: post-commit hook failed (exit code 1) with output: Traceback (most recent call last): File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 1534, in <module> sys.argv[3:3+expected_args]) File "/usr/local/lib/python2.7/site-packages/svn/core.py", line 310, in run_app return func(application_pool, *args, **kw) File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 126, in main return messenger.generate() File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 489, in generate self.output.start(group, params) File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 394, in start self.write(self.mail_headers(group, params)) File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 251, in mail_headers subject = self._rfc2047_encode(self.make_subject(group, params)) File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 246, in _rfc2047_encode return ' '.join(map(_maybe_encode_header, hdr.split())) File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 244, in _maybe_encode_header return Header(hdr_token, 'utf-8').encode() File "/usr/local/lib/python2.7/email/header.py", line 183, in __init__ self.append(s, charset, errors) File "/usr/local/lib/python2.7/email/header.py", line 267, in append ustr = unicode(s, incodec, errors) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-4: invalid continuation byte cat: -f: No such file or directory cat: inva...@example.com: No such file or directory cat: inva...@example.com: No such file or directory ]]] (Lines starts with 'cat:' is expected out put of "cat - -f inva...@example.com inva...@example.com") > Probably it needs something like this (but it doesn't support conbining > characters, and I didn't any test...): > [[[ > Index: tools/hook-scripts/mailer/mailer.py > =================================================================== > --- tools/hook-scripts/mailer/mailer.py (revision 1872398) > +++ tools/hook-scripts/mailer/mailer.py (working copy) > @@ -159,7 +159,13 @@ > truncate_subject = 0 > > if truncate_subject and len(subject) > truncate_subject: > - subject = subject[:(truncate_subject - 3)] + "..." > + # To avoid breaking utf-8 multi-bytes character sequence, we should > + # search the top of the sequence if the byte of the truncate point is > + # secound or later part of multi-bytes character sequence. > + idx = truncate_subject - 3 > + while 0x80 <= ord(subject[idx]) <= 0xbf: > + idx -= 1 > + subject = subject[:idx] + "..." > return subject > > def start(self, group, params): > ]]] After this patch applied, the script above runs without error. However, this produces Subject line below. [[[ Subject: r1 - =?utf-8?b?44CH44CH44CH5LiA?= =?utf-8?b?44CH44CH44CH5LiJ?= =?utf-8?b?44CH44CH44CH5LqM?= =?utf-8?b?44CH44CH44CH5LqU?= =?utf-8?b?44CH44CH44CH5YWt?= =?utf-8?b?44CHLi4u?=^M ]]] and decoded Results is "Subject: r1 - 〇〇〇一〇〇〇三〇〇〇二〇〇〇五〇〇〇六〇..." because white space(s) between encoded words are ignored. I think this is not what we want. Cheers, -- Yasuhito FUTATSUKI <futat...@yf.bsdclub.org> / <futat...@poem.co.jp>