Bhuvaneswaran A wrote: > On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote: > > Branko Čibej wrote: > > > Bhuvaneswaran A wrote: > > > > The failure message for few tests contain special characters, ex: > > > > What do you mean by "special" characters? Unprintable characters? > > Non-UTF8 characters? Invalid XML characters? Characters that are XML > > syntax characters such as "<"? > > When I mean special characters, I mean control characters, ex "^H". > Refer to the attachment in issue 3541 for a sample character. > > > > > prop_tests.py. As a result, it creates an invalid xml file and not being > > > > displayed in Hudson. > > > > > > > > This commit fixes this issue, also tracked in issue 3541. With this fix, > > > > the test results are displayed in Hudson, especially the results > > > > specific to 1.6.x solaris build. > > > > http://subversion.tigris.org/issues/show_bug.cgi?id=3541 > > > > > > > > Index: tools/dev/gen_junit_report.py > > > > ======================================= > > > > --- tools/dev/gen_junit_report.py (revision 886204) > > > > +++ tools/dev/gen_junit_report.py (working copy) > > > > @@ -46,6 +46,16 @@ > > > > data = data.replace(char, encode[char]) > > > > return data > > > > +def remove_special_characters(data): > > > > + """remove special characters in test failure reasons""" > > > > + if not data: > > > > + return data > > > > + chars_table = "".join([chr(n) for n in xrange(256)]) > > > > + # remove all special characters upto ascii value 31, except line > > > > feed (10) > > > > + # and carriage return (13) > > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] + > > > > chars_table[14:31] > > > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32]. > > As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ... > [11:12] ... [14:31] works for me.
In array slicing, the range start is inclusive but the end is exclusive, so [11:12] just gives the single character [11], whereas [11:13] would give [11] and [12] which is what you want. > > > Also, wouldn't it be more proper to find out why the tests put control > > > characters in the failure description than to just blindly throw them > > > away? > > > > Or just escape the "special" characters. > > Good point. I used to encode using utf-8, but it doesn't seem to > detect/encode these characters, resulting in unchanged behaviour. I used > something like: > > reason = u'%s'.encode('utf-8') % reason > reason = unicode(reason, 'utf-8') Those statements are just converting from UTF-8 to internal Unicode representation, not doing any escaping. Control characters (and all characters) are valid in both UTF-8 and internal Unicode representations, so no escaping is needed for the conversion. I searched on the web and didn't find a really really simple way to escape a set of characters. I think something like for c in chars_to_remove: data = data.replace(c, '%%%0x' % ord(c)) would do it. - Julian