Hello, I'm having a hard time explaining the exact issue but I hope it's 
clear enough.


Following this issue 
(https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ)
 
from django users and a related ticket 
(https://code.djangoproject.com/ticket/27719) that seems to be left out or 
forgotten already.

There has to be a way to alias or annotate a value given an expression or 
SQL Function that doesn't necessarily aggregates data but rather work on a 
single value.

Right now as shown on the django-users post, using annotate for this 
purpose will cause unexpected grouping and sub querying that could result 
in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues 
or values_list, will work as expected, simply annotating a value and 
returning it as an additional column, but if an aggregate is added 
afterwards (such as count), the final query ends up being a redundant query 
where the annotated value is added to a group by clause (group by id + 
column), to a column as part of the select (function called twice) and then 
wrapped into a select * (subquery), which makes the extra column as part of 
the select and group by useless, unless the query had any kind of 
left/inner join in which case the group by might make sense (although not 
sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the 
annotated value doesn't show on the group by and select sections, although 
the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly 
work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to 
implement an alias call that works exactly like annotate but doesn't 
trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate 
smarter, so all the unnecessary grouping and sub querying doesn't happen 
unless needed, for example, if the queryset didn't call values/values_list 
or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6
).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it 
since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction(a ,b) >= 0.6
    GROUP BY id, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even 
needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, 
if there's a join somewhere, or a values/values_list was used previously so 
id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the 
final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that 
using a custom lookup is not possible since the value is required for the 
order_by to work.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/4e4dbcd9-9c49-468b-b633-ca27daf3fe69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to