There is no `group_by` clause in the ORM, because this will lead to 
ambiguous queries. Ambiguous query is a query where you have a non-distinct 
column in the select, that is missing from the group_by clause. Its 
actually a MySQL fault for allowing it, in other RDBMS you will get an 
ambiguous query error.  BTW MySQL is just now fixing this starting from 
version 5.7.5 
<https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by>.
 
So the truth is that we don't need a group_by, the ORM knows how/when to 
include it.

1. Should this not be book_set__pubdate? Author has no book field, but wins 
> a book_set field authors field of Book.
>

Its just a little naming inconsistency from the ORM. When we 
filter/annotate we have to use 'book' only, but when we prefetch_related, 
we must use the `book_set` convention. Take a look at Lookups that span 
relationships. 
<https://docs.djangoproject.com/ja/1.11/topics/db/queries/#lookups-that-span-relationships>

2. How can this subquery combine with the constraint that 
> book_set__pubdate__lte=SOMEDATE? That is, I'd be interested in finding the 
> latest before a given date and double whammy. Is it as simple as adding it 
> as per:
>

I forgot to include that part, you need to do it inside the Subquery, 
because you want to annotate the Max(pubdate) only for result before that 
date.
The final version should look like this:

authors = Author.objects.filter(
    book__pubdate=Subquery(
        (Author.objects
            .filter(
                country=OuterRef('country'),
                book__pubdate__lte=date.today(),
            )
            .values('country')
            .annotate(max_date=Max('book__pubdate'))
            .values('max_date')[:1]
        ), output_field=models.DateField()
    )
)

I added output_field=models.DateField() because it raises `FieldError: 
Expression contains mixed types. You must set output_field` after I added 
the `book__pubdate__lte` filter, I'm not sure about this error, looks like 
a bug to me (the orm probably thinks that max_date could be None for some 
rows, but this must not be the case since we filter on pub_date).


On Sunday, May 14, 2017 at 4:16:19 PM UTC+3, Bernd Wechner wrote:
>
> Why thank you very kindly Todor for these insights! And some interesting 
> reading!
>
> It surprises me, I admit (with no intention of being judgmental) that 
> there isn't a simpler way, perhaps like:
>
>     LatestAuthors = 
> Author.objects.filter(book_set__pubdate__lte=SOMEDATE).group_by('country').annotate(Max('book_set__pubdate'))
>
> Where group_by() might be similar to values but only impact the "group by" 
> clause not the select clause in the generated SQL. Though there may of 
> course be issues I'm overlooking that make such a simple presentation 
> impossible to offer (part of me is compelled to suspect so given it doesn't 
> exist and my naivite no lower than average ;-).  The SQL generated by this 
> subquery is certainly more complicated than I imagined necessary (actually 
> produces a SQL subquery).
>
> In your SubQuery example however you have a constraint "book__pubdate=
> Subquery(" which raises two questions for me:
>
>    1. Should this not be book_set__pubdate? Author has no book field, but 
>    wins a book_set field authors field of Book.
>    2. How can this subquery combine with the constraint that 
>    book_set__pubdate__lte=SOMEDATE? That is, I'd be interested in finding the 
>    latest before a given date and double whammy. Is it as simple as adding it 
>    as per:
>    
>    authors = Author.objects.filter(
>        book_set__pubdate__lte=SOMEDATE,
>        book_set__pubdate=Subquery(
>            (Author.objects
>                .filter(country=OuterRef('country'))
>                .values('country')
>                .annotate(max_date=Max('book_set__pubdate'))
>                .values('max_date')[:1]
>            )
>        )
>    )
>    
> But yes I concur this strikes me as a fairly ordinary sort of SQL query in 
> my experience and hence would be nice if it were well documented how to 
> generate it in the ORM and I'm pleased you gave it a name 
> (greatest-per-group). 
>
> Am experimenting with Subquery and inspecting the SQL generated in the 
> interim.
>
> Thanks again for the very very supportive and helpful insights. 
>
> Kind regards,
>
> Bernd.
>
> On Sunday, 14 May 2017 10:08:25 UTC+10, Todor Velichkov wrote:
>>
>> This is a typical *greatest-per-group* problem, there is even a tag 
>> <https://stackoverflow.com/questions/tagged/greatest-n-per-group> for 
>> that in SO:
>> And if you search for questions in this area related to Django 
>> <https://stackoverflow.com/questions/tagged/greatest-n-per-group+django> 
>> you will probably get: here 
>> <https://stackoverflow.com/questions/2111590/how-can-a-do-a-greatest-n-per-group-query-in-django>
>>  
>> or here 
>> <https://stackoverflow.com/questions/2074514/django-query-that-get-most-recent-objects-from-different-categories>
>> .
>> My approach to this problem, is to use two queries via prefetch_related.
>> In your case it would be something like this:
>>
>> countries = Country.objects.prefetch_related(
>>     Prefetch('author_set',
>>         queryset=(Author.objects
>>                     .filter(book_set__pubdate__lte=SOMEDATE)
>>                     .annotate(last_publish=Max('book_set__pubdate'))
>>                     .order_by('-last_publish')
>>         ),
>>         to_attr='authors'
>>     )
>> )
>>
>> #usage
>> for country in countries:
>>     #get the first author in authors or None if empty
>>     top_author = next(iter(country.authors), None)
>>
>>
>> the problem with this approach is that the memory footprint could get 
>> very big if you have lots of objects per group. In my cases I was always 
>> able to filter them down to several objects only, so it was not an issue. 
>>
>> However, thanks to the new staff that come with Django 1.11, now its 
>> actually possible to build a real greatest-per-group query with the ORM.
>> This can be done using the new Subquery 
>> <https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions>
>>  
>> and OuterRef 
>> <https://docs.djangoproject.com/en/1.11/ref/models/expressions/#django.db.models.OuterRef>
>>  
>> classess. In your case it must be something like this:
>>
>> authors = Author.objects.filter(
>>     book__pubdate=Subquery(
>>         (Author.objects
>>             .filter(country=OuterRef('country'))
>>             .values('country')
>>             .annotate(max_date=Max('book_set__pubdate'))
>>             .values('max_date')[:1]
>>         )
>>     )
>> )
>>
>> But be careful with this approach, because if you have more than one 
>> maximum from a group, you get all in of them in return.
>>
>> PS. I've just recently just found out the 2nd approach, its not very 
>> clear from the docs that it possible, maybe we can/need document it? This 
>> is a common problem and may be useful to people.
>>
>>
>>
>> On Saturday, May 13, 2017 at 6:35:32 PM UTC+3, Bernd Wechner wrote:
>>>
>>> I'm bamboozled by Django aggregation:
>>>
>>> https://docs.djangoproject.com/en/1.11/topics/db/aggregation/
>>>
>>> I'd like to add a little to the example they present to clarify my 
>>> problem (keeping it in familiar territory with regards to the 
>>> documentation):
>>>
>>> from django.db import models
>>> class Author(models.Model):
>>>     name = models.CharField(max_length=100)
>>>     age = models.IntegerField()    country = models.ForeignKey('Country') 
>>> class Country(models.Model):
>>>     name = models.CharField(max_length=100)
>>> class Publisher(models.Model):
>>>     name = models.CharField(max_length=300)
>>>     num_awards = models.IntegerField()
>>> class Book(models.Model):
>>>     name = models.CharField(max_length=300)
>>>     pages = models.IntegerField()
>>>     price = models.DecimalField(max_digits=10, decimal_places=2)
>>>     rating = models.FloatField()
>>>     authors = models.ManyToManyField(Author)
>>>     publisher = models.ForeignKey(Publisher)
>>>     pubdate = models.DateField()
>>> class Store(models.Model):
>>>     name = models.CharField(max_length=300)
>>>     books = models.ManyToManyField(Book)
>>>     registered_users = models.PositiveIntegerField()
>>>
>>>
>>> I'd like to list the name and age of the (one) author in each country 
>>> that published the last book before a given date. 
>>>
>>> Ostensibly the only option I have is:
>>>
>>>     LatestAuthors = 
>>> Author.objects.filter(book_set__pubdate__lte=SOMEDATE).values('country').annotate(Max('book_set__pubdate'))
>>>
>>> That seems to work generating a query that authors are selected Grouped 
>>> By country and selecting the Maximum book pubdate ... but alas the queryset 
>>> is not a set of author objects it is a set of dictionaries that only 
>>> contain "country", I have no access to the authors name or age!
>>>
>>> It seems the values() method is overloaded, and used to both select the 
>>> values that are in the SELECT and the values that are in the GROUP BY 
>>> clause of the generated query.
>>>
>>> I would like select all the fields of Author but group by only Country.
>>>
>>> So that I could for example list the name and age of the most recently 
>>> published author in each country.
>>>
>>> I've read the docs at length and the evidence seems to suggest such an 
>>> overloading alas, and that I may need to write SQL!
>>>
>>> But I would like to think my ignorance is great enough that a better 
>>> solution exists, to select * on Author while group by Author.country.
>>>
>>> If I leave the values clause out, I get multiple Authors listed per 
>>> country (it is not grouped by country), but I get a set of authors back. If 
>>> I add a second values() clause:
>>>
>>>     LatestAuthors = 
>>> Author.objects.filter(book_set__pubdate__lte=SOMEDATE).values('country').annotate(Max('book_set__pubdate')).values()
>>>
>>> Things get bizarre ... as in I see all the Author fields in the returned 
>>> dictionaries but the group by reverts to id in Author not country. Grrr.
>>>
>>> Regards,
>>>
>>> Bernd.
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/c7ee7f3f-b20b-46ad-82b2-00d9b9b8b332%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to