Re: [python-uk] Tell us what you did with Python this year....

Tim Golden Mon, 20 Dec 2010 08:50:53 -0800

On 20/12/2010 16:08, Alec Battles wrote:

Unicode
interoperability is a pain, though, and I find it depressing to work
with in Python2.x, because it never seems to behave predictably. I
still have no idea why tokenizing Hungarian text and tokenizing German
text are not fundamentally the same operation


I have no idea why they're not:

<code - untested>
import codecs

with codecs.open ("german.txt", "rb", encoding="utf8") as f:
  german_text = f.read ()

with codecs.open ("hungarian.txt", "rb", encoding="utf8") as f:
  hungarian_text = f.read ()

# do_stuff_with (german_text)
# do_stuff_with (hungarian_text)

</code>

Of course, I'm assuming that you know what encoding has been
used to serialise the text, but if you don't then it's not
Python's fault ;)

TJG
_______________________________________________
python-uk mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-uk

Re: [python-uk] Tell us what you did with Python this year....

Reply via email to