On Sat, Feb 7, 2009 at 7:27 PM, redmonkey <michele.mem...@googlemail.com>wrote:
> > Sure, here's a bit more info. > > The external data is generated by a script and it describes a > catalogue of lot items for an auction site I'm building. The format > includes a lot number, a brief description of the lot for sale, and an > estimate for the item. Each lot is separated in the file by a '$' with > some whitespace. Here's a snippet: > > $ > 292 A collection of wine bottle and trinket boxes > Est. 30-60 > $ > 293 A paper maché letter rack with painted foliate decoration and a > C19th papier mache side chair and one other (a/f) > Est. 20-30 > $ > 294 A wall mirror with bevelled plate within gilt frame > Est. 40-60 > And this file is encoded in...? It doesn't appear to be utf-8. It may be iso8859-1. [snip] > > And here's that handle_data_upload function (it's passed the uploaded > file object): > > def handle_data_upload(f, cat): > """ > Creates and Adds lots to catalogue. > > """ > > lot = re.compile(r'\s*(?P<lot_number>\d*) (?P<description>.*) > \s*Est. (?P<min_estimate>\d*)-(?P<max_estimate>\d*)') > iterator = lot.finditer(f.read()) > f.close() > > for item in iterator: > if not item.group('description') == "end": > Lot.objects.create( > lot_number=int(item.group('lot_number')), > description=item.group('description').strip(), Here you are setting description to a bytestring read from your file. When you don't pass Unicode to Django, Django will convert to unicode assuming a utf-8 encoding, which will cause the error you are getting if the file is not in fact using utf-8 as the encoding. I suspect your file is encoded in iso8859-1, in which case changing this line to: description=unicode(item.group('description').strip(), 'iso8859-1') Will probably fix the problem. But, you should verify that that is the encoding used by whatever is creating the file, and if possible you might want to change whatever is creating the file to use utf-8 for the encoding, if possible (and if these files aren't fed into other processes that might get confused by changing their encoding). [snip] File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in > force_unicode > 70. raise DjangoUnicodeDecodeError(s, *e.args) > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/ > add/ > Exception Value: 'utf8' codec can't decode bytes in position 12-14: > invalid data. You passed in 'A paper mach\xe9 letter rack with painted > foliate decoration and a C19th papier mache side chair and one other > (a/f)' (<type 'str'>) This is why I think your file is using iso889-1: Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = 'A paper mach\xe9 letter rack' >>> print unicode(s, 'utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14: invalid data >>> print unicode(s, 'iso8859-1') A paper maché letter rack >>> The one that causes the error is what Django does when handed a bytestring, and matches what you are seeing. Using iso8859-1 as the encoding makes the value convert and print properly (plus it's a popular encoding). > > I hope that clears a few things up. > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56) > No, in that blog post the user had a broken __unicode__ method in their model, it wasn't actually an admin problem. Karen --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---