Re: Strange query when using annotate and count

Cristiano Coelho Fri, 24 Nov 2017 05:14:32 -0800

Hello Simon,

That private API is good to know, but now that I think of it would still 
not work for me, since my queryset is passed to a paginator and that's the 
class that does both the count and actual queryset execution, so need a 
queryset that can have both the annotation but also clears it if count is 
called so it doesn't create the redundant sub queries.


I'm wondering what's better, should I try to resolve this at the manager 
level overriding count? I feel like this might cause issues if the 
annotation is actually legit (ie with an aggregate) and it needs the 
subquery after all.
The other option is to subclass the paginator class with a special one that 
does this annotation clearing before running count.

Even with these cases, if the annotated value is used later with a filter 
query I can't really simply removed, but the sub queries and extra function 
calls really doesn't make sense to be there when doing a count, so it seems 
that all the options are quite bad and hackish.

Any other options?


El viernes, 24 de noviembre de 2017, 1:12:07 (UTC-3), Simon Charette 
escribió:
>
> Hello Cristiano,
>
> > Isn't there a way (even if it's hackish) to simply clear the annotations 
> (and order by)? I know querysets are smart enough to not include the order 
> by clause if there's a count.
>
> The ordering is cleared on `count()` because it shouldn't interfere with 
> the result in any way. You can clear annotations using 
> `queryset.query.annotations.clear()` but be aware that it is a private API 
> that could change under your feet. Make sure to make a copy of the queryset 
> (e.g. copy = queryset.all()) before performing this change as it will alter 
> the queryset in place.
>
> Best,
> Simon
>  
> Le jeudi 23 novembre 2017 22:41:41 UTC-5, Cristiano Coelho a écrit :
>>
>> Hello Simon, thanks for the response.
>>
>> The above code is just an example, the reason behind the annotate because 
>> there's some complicated code that builds a queryset and annotates it so it 
>> can easily be reused. It works fine 99% of the time except when there's a 
>> count involved and it ends up being redundant. The solution would be to not 
>> annotate anything and replace the code in multiple places to add the 
>> annotate call (or similar using a custom queryset or manager I guess), but 
>> that's quite painful and will end up with a lot of duplicated code since 
>> there's also an order_by that follows the annotate that needs to be moved 
>> over as well
>>
>> Isn't there a way (even if it's hackish) to simply clear the annotations 
>> (and order by)? I know querysets are smart enough to not include the order 
>> by clause if there's a count.
>>
>> You could also suggest using two separate calls or a flag to pass down to 
>> the internal code so it doesn't include the additional stuff, but that 
>> wouldn't work since paginators accept only one query set for example and 
>> internall uses it for both count and results.
>>
>>
>>
>> El viernes, 24 de noviembre de 2017, 0:05:29 (UTC-3), Simon Charette 
>> escribió:
>>>
>>> Hello Cristiano,
>>>
>>> I understand your frustration but please avoid using the developer 
>>> mailing list
>>> as a second tier support channel. I suggest you try the IRC #django 
>>> channel if
>>> you need to want to get faster support.
>>>
>>> What's happening here is that annotate() really means "select this 
>>> field" while
>>> in your other case you use a lookup (summary__icontains) which are only 
>>> going to
>>> be added to the WHERE clause of your query.
>>>
>>> I'm not sure why you are annotating your queryset without referring to 
>>> it in
>>> a filter clause later on but the ORM cannot simply ignore it when you are
>>> performing your `count()` because some annotations could interfere with 
>>> grouping
>>> somehow.
>>>
>>> There is an open ticket[0] to add support for an `alias()` method that 
>>> would
>>> allow the ORM to clear/ignore the specified expressions if it's not 
>>> referenced
>>> in the query.
>>>
>>> In the mean time I think the best approach would be to avoid annotating 
>>> the
>>> queryset if your don't need to reference the score.
>>>
>>> Cheers,
>>> Simon
>>>
>>> [0] https://code.djangoproject.com/ticket/27719
>>>
>>> Le mardi 21 novembre 2017 08:46:21 UTC-5, Cristiano Coelho a écrit :
>>>>
>>>> Hmm, should I try with the dev mailing list? Guess it's something no 
>>>> one faced before?
>>>>
>>>> El martes, 14 de noviembre de 2017, 22:54:23 (UTC-3), Cristiano Coelho 
>>>> escribió:
>>>>>
>>>>> I'm getting some very odd query when combining annotate with count. 
>>>>> See the following:
>>>>>
>>>>> >>> q = 
>>>>>> Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary'))
>>>>>> >>> q.count()
>>>>>> 3094
>>>>>> >>> print connection.queries[-1]
>>>>>> 'SELECT COUNT(*) 
>>>>>> FROM (
>>>>>>     SELECT "vulnerabilities_vulnerability"."id" AS Col1, 
>>>>>> custom_word_similarity(\'test\', 
>>>>>> "vulnerabilities_vulnerability"."summary") 
>>>>>> AS "score" 
>>>>>>     FROM "vulnerabilities_vulnerability" 
>>>>>>     GROUP BY "vulnerabilities_vulnerability"."id", 
>>>>>> custom_word_similarity(\'test\', 
>>>>>> "vulnerabilities_vulnerability"."summary")
>>>>>> ) subquery
>>>>>> >>> q2 = Vulnerability.objects.filter(summary__icontains='test')
>>>>>> >>> q2.count()
>>>>>> 33
>>>>>> >>> print connection.queries[-1]
>>>>>> 'SELECT COUNT(*) AS "__count" 
>>>>>> FROM "vulnerabilities_vulnerability" 
>>>>>> WHERE UPPER("vulnerabilities_vulnerability"."summary"::text) LIKE 
>>>>>> UPPER(\'%test%\')
>>>>>
>>>>>
>>>>>
>>>>> Custom function code, is this what's causing the odd count behavior? 
>>>>> Did I miss anything?
>>>>>
>>>>> class WordTrigramCustomSimilarity(Func):
>>>>>>     function = 'custom_word_similarity' 
>>>>>>     def __init__(self, string, expression, **extra):
>>>>>>         if not hasattr(string, 'resolve_expression'):
>>>>>>             string = Value(string)
>>>>>>         super(WordTrigramCustomSimilarity, self).__init__(string, 
>>>>>> expression, output_field=FloatField(), **extra)
>>>>>
>>>>>
>>>>> I would expect for the query to be a simple count, rather than a 
>>>>> nested query with a useless group by (correct me if I'm wrong).
>>>>> The issue gets even worse if the function is expensive, since it gets 
>>>>> called when it's not needed at all, more than once.
>>>>> Also the issue behaves pretty much the same if the queryset includes 
>>>>> filtering and ordering but I didn't include it here for simplicity.
>>>>>
>>>>> Using Django 1.11.7 + postgres (psycopg) backend.
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/e3da8d8e-a613-48d2-b84f-9a56e3fe390f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Strange query when using annotate and count

Reply via email to