[issue18679] include a codec to handle escaping only control characters but not any others

Derek Wilson Thu, 08 Aug 2013 08:24:29 -0700

Derek Wilson added the comment:

using repr(x)[1:-1] is not safe for my use case as i need this for encoding and 
decoding data. the "deserialization" of repr would be eval, and aside from the 
security issues with that, if I strip the quotes off I can't reliably eval the 
result and get back the original. On top of that, quote escape handling makes 
this non-portable to other languages/tools that do understand control character 
escapes. Consider:


>>> s = """Α""\t'''Ω"""
>>> print(s)
Α""     '''Ω
>>> e = repr(s)[1:-1]
>>> print(e)
Α""\t\'\'\'Ω

how do i know what to quote e with before I eval it to get back the value? I 
can't even try all the quoting options and stop when i don't get a syntax error 
because more than one could work and give me a bad result:

>>> d = eval('"{}"'.format(e))
>>> d == s
False
>>> print(d)
Α       '''Ω

Aside from python not being able to handle the repr(x)[1:-1] case itself, the 
goal is to use output generated in common tools from cut to hadoop where tab is 
a field separator (aside: wouldn't adoption of ascii 0x1f as a common unit 
separator be great). Sometimes it is useful to separate newlines in data from a 
literal new line in formats (again like hadoop or unix utilities) that treat 
lines as records (and here again ascii 0x1e would have been a nice solution).

But we have to work with what we've got and there are many tools that care 
about tab separated fields and per line records. In these cases, the right tool 
for the interoperability job is a codec that simply backslash escapes control 
characters and nothing else.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18679>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18679] include a codec to handle escaping only control characters but not any others

Reply via email to