[issue7615] unicode_escape codec does not escape quotes

Richard Hansen Thu, 07 Jan 2010 14:44:20 -0800

Richard Hansen <rhan...@bbn.com> added the comment:

> We'll need a patch that implements single and double quote escaping 
> for unicode_escape and a \uXXXX style escaping of quotes for the 
> raw_unicode_escape encoder.


OK, I'll remove unicode_escape_single_quotes.patch and update 
unicode_escape_reorg.patch.

> Other changes are not necessary.

Would you please clarify?  There are a few other (minor) bugs that were 
discovered while writing unicode_escape_reorg.patch that I think should be 
fixed:
  * the UTF-16 surrogate pair decoding logic could read past the end of the 
provided Py_UNICODE character array if the last character is between 0xD800 and 
0xDC00
  * _PyString_Resize() will be called on an empty string if the size argument 
of unicodeescape_string() is 0.  This will raise a SystemError because 
_PyString_Resize() can only be called if the object's ref count is 1 (even if 
no resizing is to take place) yet PyString_FromStringAndSize() returns a shared 
empty string instance if size is 0.
  * it is unclear what unicodeescape_string() should do if size < 0

Beyond those issues, I'm worried about manageability stemming from the amount 
of code duplication.  If a bug is found in one of those encoding functions, the 
other two will likely need updating.

> The pickle copy of the codec can be left untouched (both cPickle.c 
> and pickle.py) - it doesn't matter whether quotes are escaped or not 
> in the pickle data stream.

Unfortunately, pickle.py must be modified because it does its own backslash 
escaping before encoding with the raw_unicode_escape codec.  This means that 
backslashes would become double escaped and the decoded value would differ 
(confirmed by running the pickle unit tests).

The (minor) bugs in PyUnicode_EncodeRawUnicodeEscape() are also present in 
cPickle.c, so they should probably be fixed as well.

> The codecs' encode direction is not defined anywhere in the 
> documentation, AFAIK, and basically an implementation detail.

I read the escape codec documentation (see the original post) as implying that 
the encoders can generate eval-able string literals.  I'll add some clarifying 
statements.

Thanks for the feedback!

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7615>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7615] unicode_escape codec does not escape quotes

Reply via email to