What's in the database probably isn't legal UTF-8.  It is easily possible
to have a sequence of characters in some other encoding which only
results in the wrong characters if treated as UTF-8, but it is also possible
to violate the UTF-8 structure with such a sequence.  PostgreSQL, if
set for UTF-8, may still not care about the contents of character or
text columns, so long as they don't offend the quoting.

Look at the actual byte sequence in the particular entry in, for example,
hexadecimal.  If a character in which the 0x80 bit is zero is followed by
a character in which the 0x80 bit is set, that is the beginning of a multi
byte encoding of a unicode code point.  In that first character, the number
of contiguous one bits, starting with and including the 0x80 bit,  and in
most significant to least significant, before the most significant zero, is
the number of bytes in the character.  There must be at least 2 bytes,
so the 0x40 bit must also be set.  Any additional bytes required must
have their 0x80 bit a one, and their 0x40 bit a zero (continuation bytes).
Continuation bytes contribute 6 bits each to the construction of an
integer, the first byte contributes 7-n bits.  Byte value 0xFE and 0xFF
are never valid.  Bytes not part of a multi-byte sequence may not have
a one in their 0x80 bit.

Perhaps some other piece of software has dumped something into
PostgreSQL using, say, Latin-1 or Latin-8, etc.

On Thu, Jul 1, 2010 at 10:55 AM, Yateen <yateenjo...@gmail.com> wrote:
> Hi,
>
> I am using a postgres database and DJango. I have a http url in my
> database which contains some special characters, but a table query
> returns the result successfully.
>
> select http_url from mytable limit 1;
>                http_url
> ----------------------------------------
>  http://östrogenfrei.de/verhuetung.html
>
>
> If I use Django model way to get the same data, I get following error
> -
>>>> from util import *
>>>> cursor = connection.cursor()
>>>> query="select http_url from cfedr_raw_data_20100526_24860_1277981101 where 
>>>> http_url like '%rogenfrei%'"
>>>> cursor.execute(query)
>>>> data = []
>>>> for item in cursor.fetchall():
> ...     print item
> ...     data.append(item)
> ...
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/firstpool/yjoshi/permanent/starbi/python2.6.1/lib/python2.6/
> encodings/utf_8.py", line 16, in decode
>    return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 7-10:
> invalid data
>
> Can anyone please throw some light on this? why is this occurring?
> what is the solution.
>
> Thanks in advance,
>
> Yateen..
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to