Unfortunately the problem is a bit deeper in the ORM. In general, when
filtering against a multivalued relation (reverse foreign key or m2m
relation) Django uses joins instead of subqueries. This can result in
duplicating rows if multiple related rows match the filter. The fix
recommended in the docs is to apply .distinct() to the query. Using
distinct can result in slow query execution speed, and many DBAs see
this kind of usage of distinct as a surefire sign that the query is
written poorly (and I agree here).

We really should be using IN/EXISTS queries for filters spanning
multivalued relations, but for historical reasons we don't. We do that
for exclude and negated filters already.

There is also a technical problem which needs to be solved before
changing how __isnull=False works. If we want to make __isnull=False to
do EXISTS/IN query, then the additional icontains filter in the query
qs.filter(somefield__isnull=False, somefield__othercol__icontains='a')
must be pushed to the same IN/EXISTS clause. This isn't easy to do
correctly (and we don't do it correctly for exclude queries currently). 

In addition, there are cases where subqueries aren't wanted, older
versions of MySQL for example choke on subqueries.

So, there are a couple of complications for this idea:
  - Support for pushing multiple filter conditions to the same subquery
is needed even if we consider support for only isnull=False filter.
  - We should consider using IN/EXISTS + subquery for multivalued
relations in filters. This is challenging both technically and from
backwards compatibility perspective.

 - Anssi

On Fri, 2014-08-08 at 16:48 +1000, Curtis Maloney wrote:
> Can you create a PR, with docs and tests?
> 
> 
> If not, we'll need to find someone brave enough to delve into the ORM
> and add this [quite valid, IMHO] optimisation.
> 
> 
> --
> C
> 
> 
> 
> 
> On 8 August 2014 11:16, David Butler <[email protected]> wrote:
>         
>         
>         On Thursday, August 7, 2014 6:48:22 PM UTC-5, Tim Graham
>         wrote:
>                 Does .filter(somefield__isnull=False) not work for
>                 what you're trying to do?
>                 
>         
>         
>         If I do .filter(somefield__isnull=False) it tries to do a LEFT
>         OUTER JOIN on the table for somefield  instead of a WHERE
>         EXISTS (...) and if that table is very large then it is pretty
>         slow
>         
>         
>         Perhaps the answer is just to modify what __isnull does
>         instead of making a new lookup
>         
>         
>                 
>                 On Thursday, August 7, 2014 7:43:07 PM UTC-4, David
>                 Butler wrote:
>                         It would be nice if there was some database
>                         level optimization for getting objects that
>                         have related objects that exist.
>                         
>                         
>                         This is a slow filter:
>                         
>                         
>                         qs = [obj for obj in qs if
>                         qs.somefield_set.exists()]
>                         
>                         
>                         Could be faster with something like this:
>                         
>                         
>                         qs = qs.filter(somefield__exists=True)
>                         
>                         
>                         Here is some (rough, probably grossly over
>                         simplified but working) code:
>                         
>                         
>                         Code is also available here if it gets
>                         garbled: http://dpaste.com/0825PNP
>                         
>                         
>                         
>                         
>                         query_template = (
>                                 '{not_part} EXISTS (SELECT 1 FROM
>                         "{table1}" WHERE '
>                                 '"{table1}"."{table1_column}" =
>                         "{table2}"."{table2_column}")'
>                         )
>                         def
>                         filter_by_reverse_feriegn_key_existance(qs,
>                         **kw):
>                         
>                         
>                                 for arg, value in kw.items():
>                         
>                         
>                                         assert
>                         arg.endswith('__exists')
>                         
>                         
>                                         if value is True:
>                                                 not_part = ''
>                                         elif value is False:
>                                                 not_part = 'NOT'
>                         
>                         
>                                         Model = qs.model
>                                         for field in
>                         arg.rstrip('__exists').split('__'):
>                                                 field =
>                         Model._meta.get_field_by_name(field)[0]
>                                                 
>                                                 qs =
>                         qs.extra(where=[query_template.format(
>                         
>                         table1=field.model._meta.db_table,
>                         
>                         table1_column=field.field.db_column,
>                         
>                         table2=Model._meta.db_table,
>                         
>                         table2_column=Model._meta.pk.db_column,
>                                                         not_part =
>                         not_part
>                                                 )])
>                         
>                         
>                                                 Model = field.model
>                                 return qs
>                         
>                         
>                         
>                         
>                         This works on postgres, and is ~100x faster
>                         
>                         
>                         I would be interested in any comments
>         -- 
>         You received this message because you are subscribed to the
>         Google Groups "Django developers" group.
>         To unsubscribe from this group and stop receiving emails from
>         it, send an email to django-developers
>         [email protected].
>         To post to this group, send email to
>         [email protected].
>         Visit this group at
>         http://groups.google.com/group/django-developers.
>         
>         To view this discussion on the web visit
>         
> https://groups.google.com/d/msgid/django-developers/d4f2d1bc-326d-43d0-a53c-d2b1c25465cd%40googlegroups.com.
>         
>         For more options, visit https://groups.google.com/d/optout.
>         
> 
> 


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1407482729.11410.105.camel%40TTY32.
For more options, visit https://groups.google.com/d/optout.

Reply via email to