UnicodeEncodeError in Windows

2007-09-17 Thread geoff_ness
Hello - and apologies in advance for the length of this post.

I am having a hard time understanding the errors being generated by a
program I've written. The code is intended to parse text files which
are copied and pasted from web pages from an online game. The encoding
of the pages is ISO-8859-1, but the text that gets copied contains
characters from character sets other than latin-1.
For instance, one of the lines I need to be able to read is:
196679  Daimyo 石 Druid  145 27  12/09/07 21:40:04   [ Expel ]

I start with the file 'citizen_list' and use this function to read it
and return a list of names (for instance, Daimyo 石 Druid) and ID
numbers:

# builds the list of names from the citizens list
def getNames(f):
"""Builds a list from the town list of names

Returns a list"""
newlist = []
for line in f:
namewords = line.rstrip('[Expel]\n\t ')\
.rstrip(':/0123456789 ').rstrip('\t ').rstrip('0123456789 ')\
.rstrip('\t ').rstrip('0123456789 ').rstrip('\t ').split()
entry = ";".join([namewords[0], "
".join(namewords[1:len(namewords)])])
newlist.append(entry)
return newlist

citizens = codecs.open('citizen_list', 'r', 'utf-8', 'strict')
listNames = getNames(citizens)
citizens.close()

I've specified 'utf-8' as the encoding as this seemed to be the best
candidate for picking up all the names in the list. I use the names in
other functions - for example:

def getdamage(warrior, rpt):
"""reads each line of war report

returns damage and number of kills for citizen name"""
for line in rpt:
if (line.startswith(warrior.name) or \
line.startswith('A blue aura surrounds ' +
warrior.name))\
and line.find('weapon') > 0:
warrior.addDamage(int(line[line.find('caused ')
+7:line.find(' damage')]))
if rpt.next().find('is dead') >0:
warrior.addKill()
elif line.startswith(warrior.name+' is dead'):
warrior.dies()
break
elif line.startswith('Starting round'):
warrior.addRound()

for cit in listNames:
c = Warrior(cit.split(';')[0], cit.split(';')[1])
totalnum += 1
report = codecs.open('war_report','r', 'utf-8', 'strict')
getdamage(c, report)
report.close()
--[snip]--

def buildString(warrior):
"""Build a string from a warrior's stats

Returns string for output to warStat."""
return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
"!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"

This code runs fine on my linux machine, but when I sent the code to a
friend with python running on windows, he got the following error:

Traceback (most recent call last):
 File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
   exec codeObject in _main_._dict_
 File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\parser_1.0.py", line 63, in 
   "".join(["%s" % buildString(c) for c in citlistS[:100]])+"!/
table!")
 File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\iotp_alt2.py", line 169, in buildString
   "!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)

As I understand it the error is related to the ascii codec being
unable to cope with the unicode string u'\ufeff'.
The issue I have is that this error doesn't show up for me - ascii is
the default encoding for me also. Any thoughts or assistance would be
welcomed.

Cheers

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UnicodeEncodeError in Windows

2007-09-18 Thread geoff_ness
On Sep 18, 7:36 am, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Mon, 17 Sep 2007 07:38:16 -0300, geoff_ness <[EMAIL PROTECTED]>
> escribi?:
>
>
>
> > def buildString(warrior):
> > """Build a string from a warrior's stats
>
> > Returns string for output to warStat."""
> > return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
> > "!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
> > "!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
>
> > This code runs fine on my linux machine, but when I sent the code to a
> > friend with python running on windows, he got the following error:
>
> > Traceback (most recent call last):
> >  File "C:\Documents and Settings\Administrator\Desktop
> > \reparser_014(2)\iotp_alt2.py", line 169, in buildString
> >"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
> > position 0: ordinal not in range(128)
>
> > As I understand it the error is related to the ascii codec being
> > unable to cope with the unicode string u'\ufeff'.
> > The issue I have is that this error doesn't show up for me - ascii is
> > the default encoding for me also. Any thoughts or assistance would be
> > welcomed.
>
> Some of those `warrior` attributes is an Unicode object that contains
> characters outside ASCII. str(x) tries to convert to string, using the
> default encoding, and fails. This happens on Windows and Linux too,
> depending on the data.
> I've seen that you use codecs.open: you should write Unicode objects to
> the file, not strings, and that would be fine.
> Look for some recent posts about this same problem.
>
> --
> Gabriel Genellina

Thanks Gabriel, I hadn't thought about the str() function that way - I
had initially used it to coerce the attributes which are type int to
type str so that I could write them to the output file. I've rewritten
the buildString() function now so that the unicode objects don't get
fed to str(), and apparently windows copes ok with that. I'm still
puzzled as to why python at my end had no problem with it...

-- 
http://mail.python.org/mailman/listinfo/python-list