Thank you very much. That solved it, gave me all the information I needed to not make the same mistake again, and taught me a quick way to check the encoding of strings in python.
As it happen, in this case, the script that generates the external file is some commercial software, so I can't touch it. It all seems to work though. Thanks again, RM On Feb 8, 12:57 am, Karen Tracey <kmtra...@gmail.com> wrote: > On Sat, Feb 7, 2009 at 7:27 PM, redmonkey > <michele.mem...@googlemail.com>wrote: > > > > > > > Sure, here's a bit more info. > > > The external data is generated by a script and it describes a > > catalogue of lot items for an auction site I'm building. The format > > includes a lot number, a brief description of the lot for sale, and an > > estimate for the item. Each lot is separated in the file by a '$' with > > some whitespace. Here's a snippet: > > > $ > > 292 A collection of wine bottle and trinket boxes > > Est. 30-60 > > $ > > 293 A paper maché letter rack with painted foliate decoration and a > > C19th papier mache side chair and one other (a/f) > > Est. 20-30 > > $ > > 294 A wall mirror with bevelled plate within gilt frame > > Est. 40-60 > > And this file is encoded in...? It doesn't appear to be utf-8. It may be > iso8859-1. > > [snip] > > > > > > > And here's that handle_data_upload function (it's passed the uploaded > > file object): > > > def handle_data_upload(f, cat): > > """ > > Creates and Adds lots to catalogue. > > > """ > > > lot = re.compile(r'\s*(?P<lot_number>\d*) (?P<description>.*) > > \s*Est. (?P<min_estimate>\d*)-(?P<max_estimate>\d*)') > > iterator = lot.finditer(f.read()) > > f.close() > > > for item in iterator: > > if not item.group('description') == "end": > > Lot.objects.create( > > lot_number=int(item.group('lot_number')), > > description=item.group('description').strip(), > > Here you are setting description to a bytestring read from your file. When > you don't pass Unicode to Django, Django will convert to unicode assuming a > utf-8 encoding, which will cause the error you are getting if the file is > not in fact using utf-8 as the encoding. I suspect your file is encoded in > iso8859-1, in which case changing this line to: > > description=unicode(item.group('description').strip(), 'iso8859-1') > > Will probably fix the problem. But, you should verify that that is the > encoding used by whatever is creating the file, and if possible you might > want to change whatever is creating the file to use utf-8 for the encoding, > if possible (and if these files aren't fed into other processes that might > get confused by changing their encoding). > > [snip] > > File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in > > > force_unicode > > 70. raise DjangoUnicodeDecodeError(s, *e.args) > > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/ > > add/ > > Exception Value: 'utf8' codec can't decode bytes in position 12-14: > > invalid data. You passed in 'A paper mach\xe9 letter rack with painted > > foliate decoration and a C19th papier mache side chair and one other > > (a/f)' (<type 'str'>) > > This is why I think your file is using iso889-1: > > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information.>>> s = > 'A paper mach\xe9 letter rack' > >>> print unicode(s, 'utf-8') > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14: > invalid data>>> print unicode(s, 'iso8859-1') > > A paper maché letter rack > > > > The one that causes the error is what Django does when handed a bytestring, > and matches what you are seeing. Using iso8859-1 as the encoding makes the > value convert and print properly (plus it's a popular encoding). > > > > > I hope that clears a few things up. > > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56) > > No, in that blog post the user had a broken __unicode__ method in their > model, it wasn't actually an admin problem. > > Karen --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---