rzed wrote:
utabintarbo <utabinta...@gmail.com> wrote in
news:adc6c455-5616-471a-8b39-d7fdad217...@m33g2000vbi.googlegroups.c
om:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
1259006416
As I try to pull in the line and process it, python changes the
"\10" to a "\x08". This is before I can do anything with it. Is
there a way to specify that incoming lines (say, when using
.readlines() ) should be treated as raw strings?
TIA
Despite all the ragging you're getting, it is a pretty flakey thing
When the OP specified readline(), which does *not* behave this way, he
probably deserved what you call "ragging." The backslash escaping is
for string literals, which are in code, not in data files.
In any case, there's a big difference between surprising (to you), and
flakey.
that Python does in this context:
(from a python shell)
x = '\1'
x
'\x01'
x = '\10'
x
'\x08'
If you are pasting your string as a literal, then maybe it does the
same. It still seems weird to me. I can accept that '\1' means x01,
but \10 seems to be expanded to \010 and then translated from octal
to get to x08. That's just strange. I'm sure it's documented
somewhere, but it's not easy to search for.
Check in the help for "escape Strings". It's documented (in vers. 2.6,
anyway) in a nice chart that backslash followed by 3 digits, is
interpreted as octal. I don't like it much either, but it's inherited
from C, which has worked that way for 30+ years.
Online, see
http://www.python.org/doc/2.6.4/reference/lexical_analysis.html, and
look in section 2.4.1 for the chart.
Oh, and this:
'\7'
'\x07'
'\70'
'8'
... is realy odd.
Octal 70 is hex 38 (or decimal 56), which is the character '8'.
DaveA
--
http://mail.python.org/mailman/listinfo/python-list