Re: Unicode list

Georg Brandl Sun, 01 Apr 2007 03:21:35 -0700

Rehceb Rotkiv schrieb:
> Hello,
> 
> I have this little grep-like program:
> 
> ++++++++++snip++++++++++
> #!/usr/bin/python
> 
> import sys
> import re
> 
> pattern = sys.argv[1]
> inputfile = file(sys.argv[2], 'r')
> 
> for line in inputfile:
>     matches = re.findall(pattern, line)
>     if matches:
>         print matches
> ++++++++++snip++++++++++
> 
> Like this, the program prints some characters as strange escape 
> sequences, which is due to the input file being encoded in utf-8


As Paul said, your terminal is likely set to iso-8859 encoding, which
is why it doesn't display UTF-8 correctly. The above program produces
correct UTF-8 output.

What you could do is:
1. read the file in as unicode
2. print the unicode to the terminal (will use the terminal encoding) or
    convert the unicode to strings with an explicit encoding before printing

codecs.open() is very helpful for step 1, BTW.

Georg

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode list

Reply via email to