On Fri, Oct 9, 2009 at 12:40 PM, Brian Morton <rokclim...@gmail.com> wrote:

>
> This is a very strange problem, so I thought I would post here and see
> if anyone else had seen this problem.
>
> I introspected a MySQL database with Python2.6 using Django SVN HEAD
> and it produced my models as expected.  However, all CharFields have
> the max_length set to 3x the actual varchar field length in the db.
> For example, all char(1) or varchar(1) fields were represented with a
> max_length of 3.  Has anyone ever seen this issue before?
>

Searching the tracker reveals:

http://code.djangoproject.com/ticket/5725

I am not sure the situation is quite as complicated as it is thought to be
in that ticket.  Discussion in the ticket seems to think that the "right"
answer to return is going to be dependent on the table charset (actually, it
would need to be column, since the charset can be set per-column).

However, near as I can tell from a couple of brief experiments, that's not
the case.  For a latin1 encoded table with varchar(50) column the value
determined by inspectdb is 150.  Similarly, for a utf-8 encoded table with a
varchar(32) column, the value determined by inspectdb is 96.  In both cases
the value determined by inspectdb is 3x higher than the actual number of
characters that can be stored in the column, no matter the cholumn's
charset.

Where is the 3x factor coming from?  In the ticket it is mentioned that it's
related to the connection charset being utf-8.  Switch the connection
charset to latin1, and the numbers get reported properly (at least for
latin1-encoded tables).

The number in question here is the internal_size element of the description
returned by the connection cursor.  The value is defined by the Python DB
API (http://www.python.org/dev/peps/pep-0249/) but I can find no good
description of what it is supposed to be, exactly.

MySQLdb (or underlying code it is using) appears to be implementing this as
the maximum number of bytes that may be needed to hold a value returned from
this column on this connection.  That is, since on the DB side the length
specification (since MySQL 4.1) describes the number of characters that may
be stored in the column, and since a character may require as many as 3
bytes in utf-8 encoding (MySQL does not implement 4-byte utf-8 support),
some code somewhere is taking the max-length-in-characters value and
multiplying it by 3 to come up with a maximum number of bytes that may be
required to store a value from this column in the connection's charset.

Since Django is always going to set the connection charset to utf-8, and
since inspectdb should be reporting character lengths, not byte lengths, it
might be sufficient to take the internal_size value and divide by 3 to get
character length values.  That might work so long as the underlying value
returned by MySQLdb doesn't change, yet this page:

http://benanne.net/code/?p=352

states the value returned is wrong and there's a fix (without giving any
details on how it is wrong, what the fix is, nor when it might appear in a
release of MySQLdb).  And I don't have any more time to play with
investigation on this...but if different versions of MySQLdb are going to be
reporting different values here then fixing this in Django will be a it more
complicated than unconditionally dividing by 3....though still not quite as
bad as thought to be in the ticket, I don't think.

Karen

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to