Please find attached the revised patch. I incorporated following
feedback:
  a) Fix the array slicing part
  b) Escape using ord() instead of removing those characters
  c) Handle "]]>" in CDATA section
  d) Define the ascii table globally (once) and re-use

I also verified this fix by generating the junit files for tests having
special characters and simulating a test that has "]]>" in failure text.
With this patch, it generates valid junit file.

[[
Fix for issue 3541. When generating junit reports, escape special
characters, if any, from test failure messages.

* tools/dev/gen_junit_report.py
  (escape_special_characters): New method to escape special characters.
  (junit_testcase_fail, junit_testcase_xfail): Use 
  escape_special_characters() method to escape special characters in 
  test failure messages.
]]
On Fri, 2009-12-04 at 11:19 +0000, Julian Foad wrote:
> Bhuvaneswaran A wrote:
> > On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote:
> > > Branko Čibej wrote:
> > > > Bhuvaneswaran A wrote:
> > > > > The failure message for few tests contain special characters, ex:
> > > 
> > > What do you mean by "special" characters? Unprintable characters?
> > > Non-UTF8 characters? Invalid XML characters? Characters that are XML
> > > syntax characters such as "<"?
> > 
> > When I mean special characters, I mean control characters, ex "^H".
> > Refer to the attachment in issue 3541 for a sample character.
> > 
> > > > > prop_tests.py. As a result, it creates an invalid xml file and not 
> > > > > being
> > > > > displayed in Hudson.
> > > > >
> > > > > This commit fixes this issue, also tracked in issue 3541. With this 
> > > > > fix,
> > > > > the test results are displayed in Hudson, especially the results
> > > > > specific to 1.6.x solaris build.
> > > > >   http://subversion.tigris.org/issues/show_bug.cgi?id=3541
> > > > >   
> > > > > Index: tools/dev/gen_junit_report.py
> > > > > =======================================
> > > > > --- tools/dev/gen_junit_report.py     (revision 886204)
> > > > > +++ tools/dev/gen_junit_report.py     (working copy)
> > > > > @@ -46,6 +46,16 @@
> > > > > data = data.replace(char, encode[char])
> > > > > return data
> > > > > +def remove_special_characters(data):
> > > > > + """remove special characters in test failure reasons"""
> > > > > + if not data:
> > > > > + return data
> > > > > + chars_table = "".join([chr(n) for n in xrange(256)])
> > > > > + # remove all special characters upto ascii value 31, except line
> > > > > feed (10)
> > > > > + # and carriage return (13)
> > > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] +
> > > > > chars_table[14:31]
> > > 
> > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32].
> > 
> > As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ...
> > [11:12] ... [14:31] works for me.
> 
> In array slicing, the range start is inclusive but the end is exclusive,
> so [11:12] just gives the single character [11], whereas [11:13] would
> give [11] and [12] which is what you want.
> 
> > > > Also, wouldn't it be more proper to find out why the tests put control
> > > > characters in the failure description than to just blindly throw them 
> > > > away?
> > > 
> > > Or just escape the "special" characters.
> > 
> > Good point. I used to encode using utf-8, but it doesn't seem to
> > detect/encode these characters, resulting in unchanged behaviour. I used
> > something like:
> > 
> > reason = u'%s'.encode('utf-8') % reason
> > reason = unicode(reason, 'utf-8')
> 
> Those statements are just converting from UTF-8 to internal Unicode
> representation, not doing any escaping. Control characters (and all
> characters) are valid in both UTF-8 and internal Unicode
> representations, so no escaping is needed for the conversion.
> 
> I searched on the web and didn't find a really really simple way to
> escape a set of characters. I think something like
> 
>   for c in chars_to_remove:
>     data = data.replace(c, '%%%0x' % ord(c))
> 
> would do it.
> 
> - Julian
> 
> 


-- 
Bhuvaneswaran A    
CollabNet Software P Ltd.  |  www.collab.net
Index: tools/dev/gen_junit_report.py
===================================================================
--- tools/dev/gen_junit_report.py	(revision 886204)
+++ tools/dev/gen_junit_report.py	(working copy)
@@ -33,6 +33,8 @@
 import os
 import getopt
 
+ASCII_TABLE = "".join([chr(n) for n in xrange(256)])
+
 def xml_encode(data):
     """encode the xml characters in the data"""
     encode = {
@@ -46,6 +48,18 @@
         data = data.replace(char, encode[char])
     return data
 
+def escape_special_characters(data):
+    """remove special characters in test failure reasons"""
+    if not data:
+        return data
+    # remove all special characters upto ascii value 31, except line feed (10) 
+    # and carriage return (13)
+    chars_to_remove = ASCII_TABLE[0:10] + ASCII_TABLE[11:13] + ASCII_TABLE[14:32]
+    for char in chars_to_remove:
+        data = data.replace(char, '%%%0x' % ord(char))
+    data = data.replace(']]>', ']]]]><![CDATA[>')
+    return data
+
 def start_junit():
     """define the beginning of xml document"""
     head = """<?xml version="1.0" encoding="UTF-8"?>"""
@@ -72,6 +86,7 @@
     """mark the test case as FAILED"""
     casename = xml_encode(casename)
     sub_test_name = test_name.replace('.', '-')
+    reason = escape_special_characters(reason)
     case = """<testcase time="ELAPSED_CASE_%s" name="%s" classname="%s">
       <failure type="Failed"><![CDATA[%s]]></failure>
     </testcase>""" % (test_name, casename, sub_test_name, reason)
@@ -81,6 +96,7 @@
     """mark the test case as XFAILED"""
     casename = xml_encode(casename)
     sub_test_name = test_name.replace('.', '-')
+    reason = escape_special_characters(reason)
     case = """<testcase time="ELAPSED_CASE_%s" name="%s" classname="%s">
       <system-out><![CDATA[%s]]></system-out>
     </testcase>""" % (test_name, casename, sub_test_name, reason)

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to