Sorry for breaking threading by replying to a reply, but I don't seem to have the original post.
On Wed, 2008-03-12 at 15:29 -0500, Michael Wieher wrote: > Hey all, > > I have these annoying textilfes that are delimited by the ASCII char > for << (only its a single character) and >> (again a single character) > > Their codes are 174 and 175, respectively. > > My datafiles are in the moronic form > > X<<Y>>Z The glyph that looks like "<<" is a left quote in some European countries (and a right quote in others, sigh...), and similar for ">>", and are usually known as left and right "angle quotation mark", chevron or guillemet. And yes, that certainly looks like a moronic form for a data file. But whatever the characters are, we can work with them as normal, if you don't mind ignoring that they don't display properly everywhere: >>> lq = chr(174) >>> rq = chr(175) >>> s = "x" + lq + "y" + rq + "z" >>> print s x�y�z >>> s.split(lq) ['x', 'y\xafz'] >>> s.split(rq) ['x\xaey', 'z'] And you can use regular expressions as well. Assuming that the quotes are never nested: >>> import re >>> r = re.compile(lq + '(.*?)' + rq) >>> r.search(s).group(1) 'y' If you want to treat both characters the same: >>> s = s.replace(lq, rq) >>> s.split(rq) ['x', 'y', 'z'] -- Steven -- http://mail.python.org/mailman/listinfo/python-list