0.219 usec per loop
(regular) = 100 loops, best of 3: 0.231 usec per loop
Python 2.7.2
(slots) = 100 loops, best of 3: 0.244 usec per loop
(regular) = 100 loops, best of 3: 0.285 usec per loop
Python 3.2
(slots) = 100 loops, best of 3: 0.193 usec per loop
(regular) = 100 loops, best of 3: 0.224 usec per loop
-- John-John Tedro
--
http://mail.python.org/mailman/listinfo/python-list
st.txt
#> file test.txt
test.txt: UTF-8 Unicode text
#> iconv test.txt -f utf-8 -t latin1 > test.l1.txt
#> file test.l1.txt
test.l1.txt: ISO-8859 text
Note: I use latin1 (iso-8859-1) because it can describe the characters 'å',
'ä', 'ö'. Your encoding might be different depending on what system you are
using.
The gist is that if you specify the correct encoding as mentioned above with
the "coding"-comment, your program will probably (ish) run as intended.
-- John-John Tedro
--
http://mail.python.org/mailman/listinfo/python-list
makes "\w, \W, \b, \B,
\s and \S dependent on the current locale".
Which probably does not yield to the special rules mentioned above, but I
could be wrong. Make sure that your locale is correct and test again.
If you are unsuccessful, I don't see a 'Turkic flag' being introduced into
re module any time soon, given the following from PEP 20
"Special cases aren't special enough to break the rules"
Cheers,
-- John-John Tedro
--
http://mail.python.org/mailman/listinfo/python-list
even
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Yeah, it's more probable that language conventions and functions grow around
characters that look right.
No one except developers care what specific codepoint they have, so soon you
would have a mish-mash of special rules converting between each special
case.
P.S. Sorry Steven, i missed clicking "reply to all".
-- John-John Tedro
--
http://mail.python.org/mailman/listinfo/python-list