Thank you very much. That solved it, gave me all the information I
needed to not make the same mistake again, and taught me a quick way
to check the encoding of strings in python.

As it happen, in this case, the script that generates the external
file is some commercial software, so I can't touch it. It all seems to
work though.

Thanks again,

RM

On Feb 8, 12:57 am, Karen Tracey <kmtra...@gmail.com> wrote:
> On Sat, Feb 7, 2009 at 7:27 PM, redmonkey 
> <michele.mem...@googlemail.com>wrote:
>
>
>
>
>
> > Sure, here's a bit more info.
>
> > The external data is generated by a script and it describes a
> > catalogue of lot items for an auction site I'm building. The format
> > includes a lot number, a brief description of the lot for sale, and an
> > estimate for the item. Each lot is separated in the file by a '$' with
> > some whitespace. Here's a snippet:
>
> > $
> >  292 A collection of wine bottle and trinket boxes
> >     Est. 30-60
> > $
> >  293 A paper maché letter rack with painted foliate decoration and a
> > C19th papier mache side chair and one other (a/f)
> >     Est. 20-30
> > $
> >  294 A wall mirror with bevelled plate within gilt frame
> >     Est. 40-60
>
> And this file is encoded in...?  It doesn't appear to be utf-8.  It may be
> iso8859-1.
>
>  [snip]
>
>
>
>
>
> > And here's that handle_data_upload function (it's passed the uploaded
> > file object):
>
> > def handle_data_upload(f, cat):
> >    """
> >    Creates and Adds lots to catalogue.
>
> >    """
>
> >    lot = re.compile(r'\s*(?P<lot_number>\d*) (?P<description>.*)
> > \s*Est. (?P<min_estimate>\d*)-(?P<max_estimate>\d*)')
> >    iterator = lot.finditer(f.read())
> >    f.close()
>
> >    for item in iterator:
> >        if not item.group('description') == "end":
> >            Lot.objects.create(
> >                lot_number=int(item.group('lot_number')),
> >                description=item.group('description').strip(),
>
> Here you are setting description to a bytestring read from your file.  When
> you don't pass Unicode to Django, Django will convert to unicode assuming a
> utf-8 encoding, which will cause the error you are getting if the file is
> not in fact using utf-8 as the encoding.  I suspect your file is encoded in
> iso8859-1, in which case changing this line to:
>
> description=unicode(item.group('description').strip(), 'iso8859-1')
>
> Will probably fix the problem.  But, you should verify that that is the
> encoding used by whatever is creating the file, and if possible you might
> want to change whatever is creating the file to use utf-8 for the encoding,
> if possible (and if these files aren't fed into other processes that might
> get confused by changing their encoding).
>
> [snip]
>
> File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in
>
> > force_unicode
> >  70.         raise DjangoUnicodeDecodeError(s, *e.args)
>
> > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/
> > add/
> > Exception Value: 'utf8' codec can't decode bytes in position 12-14:
> > invalid data. You passed in 'A paper mach\xe9 letter rack with painted
> > foliate decoration and a C19th papier mache side chair and one other
> > (a/f)' (<type 'str'>)
>
> This is why I think your file is using iso889-1:
>
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
> [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> s = 
> 'A paper mach\xe9 letter rack'
> >>> print unicode(s, 'utf-8')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14:
> invalid data>>> print unicode(s, 'iso8859-1')
>
> A paper maché letter rack
>
>
>
> The one that causes the error is what Django does when handed a bytestring,
> and matches what you are seeing.  Using iso8859-1 as the encoding makes the
> value convert and print properly (plus it's a popular encoding).
>
>
>
> > I hope that clears a few things up.
>
> > Is this an admin thing? (http://www.factory-h.com/blog/?p=56)
>
> No, in that blog post the user had a broken __unicode__ method in their
> model, it wasn't actually an admin problem.
>
> Karen
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to