Bugs item #1224047, was opened at 2005-06-20 12:52 Message generated for change (Comment added) made by henrikwj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1224047&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Closed Resolution: Invalid Priority: 5 Submitted By: Henrik Winther Jensen (henrikwj) Assigned to: Michael Hudson (mwh) Summary: Len too large with national characters Initial Comment: It looks as if len returns the lenght of an UTF8 string even if the string only contains ascii characters and default encoding is ascii. This means that if you insert f. ex. one danish ø in a string. len will return a value of 2. i.e. a='ø' print len(a) gives: 2 ---------------------------------------------------------------------- >Comment By: Henrik Winther Jensen (henrikwj) Date: 2005-06-20 15:41 Message: Logged In: YES user_id=1299770 Yes, you are right, the problem is that the console-thingy converts my iso8859 characters to utf-8. Thanks for the explanation. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2005-06-20 15:12 Message: Logged In: YES user_id=6656 Well, what encoding is the file in? I suspect that it's in utf-8, so when you open the file and call read() you get utf-8 data and thus your danish character is represented as two bytes. You might want to do import codecs fileobj = codecs.open('filename.txt', encoding='utf-8') and then fileobj.read() will return a unicode string of the length you're expecting. At any rate, I see no evidence of a Python bug here, so closing. ---------------------------------------------------------------------- Comment By: Henrik Winther Jensen (henrikwj) Date: 2005-06-20 15:06 Message: Logged In: YES user_id=1299770 Actually the problem persists whether i am reading from a file or inputting from a keyboard. I am using python from the command line in linux shell. I dont know what console that is. But it is able to show the danish characters on the screen as well as reading them from the keyboard. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2005-06-20 14:12 Message: Logged In: YES user_id=6656 How are you getting your danish character into the string? If it's by typing it into a console, is your console in utf-8 mode? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1224047&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com