Control: tags -1 pending On 2014-02-12 08:56:03, Felix Dreissig wrote: > Package: monkeysign > Version: 2.x > Severity: normal > > I wanted to build the manpage only for Monkeysign’s CLI version, so I removed > `monkeyscan:monkeysign.gtkui:MonkeysignScanUi.parser` from ‘setup.cfg' and > ran `setup.py build_manpage`. > That failed with: > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 55: >> ordinal not in range(128) > > An encoding problem didn’t make any sense to me, so I tried to track the > issue down. Turns out it doesn’t occur when PyGTK is imported into the build > process, either directly through 'gtkui.py' or via 'msg_exception.py'. > The explanation for this behaviour is that PyGTK sets Python’s default > encoding to UTF-8. This is GNOME bug 132040 from back in 2004: > https://bugzilla.gnome.org/show_bug.cgi?id=132040 > > So what exactly causes the above error? > It is the accent in your surname, anarcat, that causes manpage writing to > fail with ASCII encoding ;-). The best way to fix this would in my opinion be > using an unicode string for `author` in 'setup.py', but Disutils seem not to > respect that.
damn french. ;) i agree that author should be unicode, no idea while distutils is dropping that to the floor. oh well. > I used the following patch, which works: > >> --- a/monkeysign/documentation.py >> +++ b/monkeysign/documentation.py >> @@ -84,7 +84,7 @@ class build_manpage(Command): >> def _write_footer(self, parser): >> ret = [] >> appname = self.distribution.get_name() >> - author = '%s <%s>' % (self.distribution.get_author(), >> + author = '%s <%s>' % >> (self.distribution.get_author().decode('utf-8'), >> self.distribution.get_author_email()) >> ret.append(('.SH AUTHORS\n.B %s\nwas written by %s.\n' >> % (self._markup(appname), self._markup(author)))) >> @@ -109,7 +109,7 @@ class build_manpage(Command): >> path = os.path.join(self.output, parser.prog + '.1') >> self.announce('writing man page to %s' % path, 2) >> stream = open(path, 'w') >> - stream.write(''.join(manpage)) >> + stream.write(''.join(manpage).encode('utf-8')) >> stream.close() I used a slight variation, i decode in the ret.append() call so that the email can also contain accents, which may be illegal, but I don't care: i'm not going to go enforcing standards here, i want to avoid crashes at build time. :) > It might, however, not be the most comprehensive way to deal with the issue: > The whole process of generating manpages uses a mixture of ordinary and > unicode strings and might need some review with respect to encoding issues. true. this was messy in the first place, although I am not sure i want to pursue this much further. :P thanks for all the patches and help! a. -- That's the kind of society I want to build. I want a guarantee - with physics and mathematics, not with laws - that we can give ourselves real privacy of personal communications. - John Gilmore
pgpyhjfp6iEYW.pgp
Description: PGP signature