Please find attached the revised patch. I incorporated following feedback: a) Fix the array slicing part b) Escape using ord() instead of removing those characters c) Handle "]]>" in CDATA section d) Define the ascii table globally (once) and re-use
I also verified this fix by generating the junit files for tests having special characters and simulating a test that has "]]>" in failure text. With this patch, it generates valid junit file. [[ Fix for issue 3541. When generating junit reports, escape special characters, if any, from test failure messages. * tools/dev/gen_junit_report.py (escape_special_characters): New method to escape special characters. (junit_testcase_fail, junit_testcase_xfail): Use escape_special_characters() method to escape special characters in test failure messages. ]] On Fri, 2009-12-04 at 11:19 +0000, Julian Foad wrote: > Bhuvaneswaran A wrote: > > On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote: > > > Branko Čibej wrote: > > > > Bhuvaneswaran A wrote: > > > > > The failure message for few tests contain special characters, ex: > > > > > > What do you mean by "special" characters? Unprintable characters? > > > Non-UTF8 characters? Invalid XML characters? Characters that are XML > > > syntax characters such as "<"? > > > > When I mean special characters, I mean control characters, ex "^H". > > Refer to the attachment in issue 3541 for a sample character. > > > > > > > prop_tests.py. As a result, it creates an invalid xml file and not > > > > > being > > > > > displayed in Hudson. > > > > > > > > > > This commit fixes this issue, also tracked in issue 3541. With this > > > > > fix, > > > > > the test results are displayed in Hudson, especially the results > > > > > specific to 1.6.x solaris build. > > > > > http://subversion.tigris.org/issues/show_bug.cgi?id=3541 > > > > > > > > > > Index: tools/dev/gen_junit_report.py > > > > > ======================================= > > > > > --- tools/dev/gen_junit_report.py (revision 886204) > > > > > +++ tools/dev/gen_junit_report.py (working copy) > > > > > @@ -46,6 +46,16 @@ > > > > > data = data.replace(char, encode[char]) > > > > > return data > > > > > +def remove_special_characters(data): > > > > > + """remove special characters in test failure reasons""" > > > > > + if not data: > > > > > + return data > > > > > + chars_table = "".join([chr(n) for n in xrange(256)]) > > > > > + # remove all special characters upto ascii value 31, except line > > > > > feed (10) > > > > > + # and carriage return (13) > > > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] + > > > > > chars_table[14:31] > > > > > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32]. > > > > As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ... > > [11:12] ... [14:31] works for me. > > In array slicing, the range start is inclusive but the end is exclusive, > so [11:12] just gives the single character [11], whereas [11:13] would > give [11] and [12] which is what you want. > > > > > Also, wouldn't it be more proper to find out why the tests put control > > > > characters in the failure description than to just blindly throw them > > > > away? > > > > > > Or just escape the "special" characters. > > > > Good point. I used to encode using utf-8, but it doesn't seem to > > detect/encode these characters, resulting in unchanged behaviour. I used > > something like: > > > > reason = u'%s'.encode('utf-8') % reason > > reason = unicode(reason, 'utf-8') > > Those statements are just converting from UTF-8 to internal Unicode > representation, not doing any escaping. Control characters (and all > characters) are valid in both UTF-8 and internal Unicode > representations, so no escaping is needed for the conversion. > > I searched on the web and didn't find a really really simple way to > escape a set of characters. I think something like > > for c in chars_to_remove: > data = data.replace(c, '%%%0x' % ord(c)) > > would do it. > > - Julian > > -- Bhuvaneswaran A CollabNet Software P Ltd. | www.collab.net
Index: tools/dev/gen_junit_report.py =================================================================== --- tools/dev/gen_junit_report.py (revision 886204) +++ tools/dev/gen_junit_report.py (working copy) @@ -33,6 +33,8 @@ import os import getopt +ASCII_TABLE = "".join([chr(n) for n in xrange(256)]) + def xml_encode(data): """encode the xml characters in the data""" encode = { @@ -46,6 +48,18 @@ data = data.replace(char, encode[char]) return data +def escape_special_characters(data): + """remove special characters in test failure reasons""" + if not data: + return data + # remove all special characters upto ascii value 31, except line feed (10) + # and carriage return (13) + chars_to_remove = ASCII_TABLE[0:10] + ASCII_TABLE[11:13] + ASCII_TABLE[14:32] + for char in chars_to_remove: + data = data.replace(char, '%%%0x' % ord(char)) + data = data.replace(']]>', ']]]]><![CDATA[>') + return data + def start_junit(): """define the beginning of xml document""" head = """<?xml version="1.0" encoding="UTF-8"?>""" @@ -72,6 +86,7 @@ """mark the test case as FAILED""" casename = xml_encode(casename) sub_test_name = test_name.replace('.', '-') + reason = escape_special_characters(reason) case = """<testcase time="ELAPSED_CASE_%s" name="%s" classname="%s"> <failure type="Failed"><![CDATA[%s]]></failure> </testcase>""" % (test_name, casename, sub_test_name, reason) @@ -81,6 +96,7 @@ """mark the test case as XFAILED""" casename = xml_encode(casename) sub_test_name = test_name.replace('.', '-') + reason = escape_special_characters(reason) case = """<testcase time="ELAPSED_CASE_%s" name="%s" classname="%s"> <system-out><![CDATA[%s]]></system-out> </testcase>""" % (test_name, casename, sub_test_name, reason)
signature.asc
Description: This is a digitally signed message part