Hi Bruno, first of all thanks for your anser.
On 6 sep, 01:13, bruno desthuilliers <[EMAIL PROTECTED]>
wrote:
> On 5 sep, 23:52, Sylvain <[EMAIL PROTECTED]> wrote:
>
> > Hi everybody,
>
> > Today I was hunting down some serious performance problems in my app
> > and I found something pretty "confusing". I realized that, when
> > creating generic relations with the ContentTypes framework (http://
> > docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#id1) Django
> > does create a foreign key relationship on the content_type field,
> > which is normal, but, the field object_id not being related to any
> > other table, Django doesn't create any foreign key relationship for
> > this field.
>
> As the name imply, this is a "generic" relationship. What object_id
> refers to depends on the content_type_id, so I fail to see how you
> could express a foreign key constraint here.
Yes, I didn't mean that Django was wrong on this point, sorry for the
confusion. As you said, it's perfectly normal that Django doesn't
create and foreign key on this field since it's not a foreign key, I
was just introducing the problematic.
>
> > This means that there won't be any index on that field
>
> From a relational schema design POV, a foreign key - whether the
> constraint is declared or not - doesn't necessarily imply an index.
You're right, but MySQL, for example, automatically creates indexes on
foreign key fields for performance reasons.
>
> > and when
> > retrieving objects which are related to a generic item, the DBMS will
> > have to examine every row in the table, which is time consuming when
> > the table is getting big.
>
> Not necessarily, depending on the distribution of values in the
> indexed field(s). It's a known (and demonstrable) fact that
> insufficiently discriminating indexes (indexes on field(s) that have
> too few distinct values) can lead to lower performances than a plain
> sequential scan. So in fact, if you have a table with a generic
> relationship (like for example the comments_freecomment table) that
> happens (for a given project) to be practically used with only one
> single content_type_id, it might be better to actually drop the index
> on the content_type_id field.
>
> Not to say that you're totally wrong about the lack of index on the
> object_id field being a possible cause of performance problems - just
> that adding an index on a field doesn't necessarily improve
> performances.
I know this thing about values distribution in the indexed fields,
something like "if you have less than 30% of "difference" between you
records, you shouldn't put an index on it" and I know that putting
indexes on every single field doesn't improve performance. I'm just
saying that maybe there's a lack of index on a specific field
(object_id). You say that there are chances to have high duplication
on the object_id field, but this applies to any other field. Take the
content_type field for example. The content_type field has far more
chances of having high duplication than object_id (in my case I only
have 2 different content_types, so when Django executes the create
foreign key statement, MySQL automatically adds an index on the field,
giving me a distribution of an average of 50%, which shouldn't leave
to an index creation). But maybe it's MySQL's fault here, Django only
creates the foreign key relation. But I'm still thinking that this
object_id field has poor chances of having high duplication, like any
other relation.
.
>
> > To get rid of this problem, I'm forced to
> > use "db_index=True" on every object_id field, and I think Django
> > should do it automatically.
> > What do you think about that ?
>
> First thing that comes to (my) mind is that since content_type is not
> part of the core, I fail to see how Django's ORM could special case
> these fields, so I guess this leave us having to handle the case by
> ourselves.
I never looked how Django works internally but I think that maybe the
following like of code could tell Django which are the contenttype
fields :
content_object = generic.GenericForeignKey('content_type',
'object_id')
So Django could either put an composed index on these two fields, or
two separate indexes. Or maybe no index at all on the content_type
column, I don't know.
>
> Another point is that the index should in most cases be set on both
> the content_type_id *and* the object_id fields *together* - since in
> the worst case (the one mentioned above where the generic relationship
> happens to be not that generic in practice) it shouldn't lower the
> discrimination factor, and on other cases it should raise it.
>
> My 2 cents...
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---