Rehceb Rotkiv schrieb: > Hello, > > I have this little grep-like program: > > ++++++++++snip++++++++++ > #!/usr/bin/python > > import sys > import re > > pattern = sys.argv[1] > inputfile = file(sys.argv[2], 'r') > > for line in inputfile: > matches = re.findall(pattern, line) > if matches: > print matches > ++++++++++snip++++++++++ > > Like this, the program prints some characters as strange escape > sequences, which is due to the input file being encoded in utf-8
As Paul said, your terminal is likely set to iso-8859 encoding, which is why it doesn't display UTF-8 correctly. The above program produces correct UTF-8 output. What you could do is: 1. read the file in as unicode 2. print the unicode to the terminal (will use the terminal encoding) or convert the unicode to strings with an explicit encoding before printing codecs.open() is very helpful for step 1, BTW. Georg -- http://mail.python.org/mailman/listinfo/python-list