s...@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000: > URL: http://svn.apache.org/viewvc?rev=1884427&view=rev > Log: > Make mailer.py work properly with Python 3, and drop Python 2 support. > > Most of the changes deal with the handling binary data vs Python strings. > > I've made sure that mailer.py will work in a UTF-8 environment. In general, > UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option. > Environments using other encodings may not work as expected, but those will > be problematic for hook scripts in general.
Correct me if I'm wrong, but it sounds like you haven't ruled out the possibility that this commit will constitute a regression for anyone who runs mailer.py in a non-UTF-8 environment and will upgrade to this commit. I suppose it's fair to classify non-UTF-8 environments as "patches welcome", following the precedent of libmagic support in the Windows build, but: 1. Can we detect non-UTF-8 environments and warn or error out hard upon them? «locale.getlocale()[1]» seems promising? 2. A change that hasn't been confirmed *not* to constitute a regression merits a release notes entry. Would you do the honours? Cheers, Daniel > SVN repositories store internal > data such as paths in UTF-8. Our Python3 bindings do not deal with encoding > or decoding of such data, and thus need to work with raw UTF-8 strings, not > Python strings. > > The encoding of file and property contents is not guaranteed to be UTF-8. > This was already a problem before this change. This hook script sends email > with a content type header specifying the UTF-8 encoding. Diffs which contain > non-UTF-8 text will most likely not render properly when viewed in an email > reader. At least this problem is now obvious in mailer.py's implementation, > since all unidiff text is now written out directly as binary data. > > As an additional fix, iterate file groups in sorted order. This results in > stable output and makes test cases in our tests/ subdirectory reproducible. > > Tested with Python 3.7.5 which is the version I use in my SVN development > setup at present. Tests with newer versions are welcome. > > * tools/hook-scripts/mailer/mailer.py: > Drop Python2-specific includes. Adjust includes as per 2to3. > (main): Decode arguments from UTF-8 to string. > (OutputBase:write): Encode string to UTF-8 and pass to write_binary(). > OutputBase implementations now need to provide a self.write_binary > member which implements a write() method for binary data. > (MailedOutput): email.Header package is gone, use email.header instead, > and likewise replace use of email.Utils with email.utils > (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object. > We cannot use StringIO since diffs may contain data in arbitrary encodings. > (StandardOutput): Provide self.write_binary in terms of stdout.buffer. > (PipeOutput): Provide self.write_binary in terms of pipe.stdin. > (Commit): Decode log message and paths from UTF-8 to string, and iterate > path groups from mailer.conf in sorted order. > (Lock): Decode directory entries from UTF-8 to string. Encode paths back > to UTF-8 when we ask libsvn_fs for a lock on a path. > Iterate path groups from mailer.conf in sorted order. > (DiffGenerator): Decode repository paths from UTF-8 to string. > (TextCommitRenderer): Decode author, log message, and path from UTF-8 to > string. Write diff data via write_binary, bypassing the re-encoding step. > (Config): Decode paths from UTF-8 to string before matching them against > regular expressions. Also decode the repository directory path from UTF-8. > > * tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output. > File groups are now provided in stable sorted order. This should fix > spurious test failures in the future. > > * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long > integers and pass binary data instead of strings into libsvn_fs.