+ Andreas Tille (Fri, 22 May 2009 16:00:48 +0200): > Hi,
> I observed encoding problems when reading descriptions from > UDD if they do contain non-ASCII characters and I wonder > what I might do wrong. Here is a little test program which > queries for some descriptions I found to be problematic: UDD just has the descriptions from Packages.gz, which supposedly are in UTF-8. If your destination (a file, terminal, whatever) should be receiving UTF-8, you can just pass them unmodified, eg.: for row in curs.fetchall(): print "%s: %s (%s)\n%s\n" % (pkg, row[0], row[2], row[1]) That works for me. If, for some reason, you need unicode() and not str() objects, then you should specify that the string is in UTF-8, otherwise it will default to ASCII: for row in curs.fetchall(): string = unicode(row[1], 'utf-8') So, your test program is not of much help. If you're still stuck, you should probably say what are you really trying to do, with details. But I don't think it's going to be a problem in UDD. P.S.: If doing `unicode(row[1], 'utf-8')` raises an exception, that would be because a package contains non-UTF8 in a description. Your program should be robust against that, and you can do: try: string = unicode(row[1], 'utf-8') except UnicodeDecodeError: string = unicode(row[1], 'latin1') [And file a bug against the package as well.] HTH, -- - Are you sure we're good? - Always. -- Rory and Lorelai -- To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org