I have a table that looks like this

date                car_crashes         city
01.01               1                   Washington
01.02               4                   Washington
01.03               0                   Washington
01.04               2                   Washington
01.05               0                   Washington
01.06               3                   Washington
01.07               4                   Washington
01.08               1                   Washington
01.01               0                   Detroit
01.02               2                   Detroit
01.03               4                   Detroit
01.04               2                   Detroit
01.05               0                   Detroit
01.06               3                   Detroit
01.07               1                   Detroit


I want to know how many car crashes for each day happened in the entire 
nation, and I can do that with this:


Model.values("date") \
.annotate(car_crashes=Sum('car_crashes')) \
.values("date", "car_crashes")



Now, let's suppose I have an array like this:

weights = [
    {
        "city": "Washington",
        "weight": 1,
    },
    {
        "city": "Detroit",
        "weight": 2,
    }
]



This means that Detroit's car crashes should be multiplied by 2 before 
being aggregated with Washington's.

It can be done like this:

from django.db.models import IntegerField


when_list = [When(city=w['city'], then=w['weight']) for w in weights]
case_params = {'default': 1, 'output_field': IntegerField()}


Model.objects.values('date') \
    .annotate(
        weighted_car_crashes=Sum(
            F('car_crashes') * Case(*when_list, **case_params)
    ))

However, this generates very slow SQL code, especially as more properties 
and a larger array are introduced.

Another solution which is way faster but still sub-optimal is using pandas 
to :

aggregated = false
for weight in weights:

 ag = Model.values("date") \
 .annotate(car_crashes=Sum('car_crashes')) \
 .values("date", "car_crashes")


 if aggregated is False:
 aggregated = ag
 else:
 aggregated = aggregated.union(ag)


aggregated = pd.DataFrame(aggregated)
if len(weights) > 1:
 aggregated = aggregated.groupby("date", as_index=False).sum(level=[1])


This is faster, but still not as fast as what happens if, before calling 
pandas, I take the aggregated.query string and
wrap it with a few lines of SQL.


SELECT "date", sum("car_crashes") FROM (


// String from Python
str(aggregated.query)


)


This works perfectly when pasted into my database SQL. I could do this in 
Python/Django using .raw() but the documentation says to ask here before 
using .raw() as mostly anything could be acomplished with the ORM.

Yet, I don't see how. Once I call .union on 2 querysets, I cannot aggregate 
further.

aggregated.union(ag).annotate(cc=Sum('car_crashes'))

gives

Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate




Is this possible to do with the Django ORM or should I use .raw()?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/3e3e80a1-29e4-47bd-8688-f826063d2b02%40googlegroups.com.

Reply via email to