I have a table that looks like this date car_crashes city 01.01 1 Washington 01.02 4 Washington 01.03 0 Washington 01.04 2 Washington 01.05 0 Washington 01.06 3 Washington 01.07 4 Washington 01.08 1 Washington 01.01 0 Detroit 01.02 2 Detroit 01.03 4 Detroit 01.04 2 Detroit 01.05 0 Detroit 01.06 3 Detroit 01.07 1 Detroit
I want to know how many car crashes for each day happened in the entire nation, and I can do that with this: Model.values("date") \ .annotate(car_crashes=Sum('car_crashes')) \ .values("date", "car_crashes") Now, let's suppose I have an array like this: weights = [ { "city": "Washington", "weight": 1, }, { "city": "Detroit", "weight": 2, } ] This means that Detroit's car crashes should be multiplied by 2 before being aggregated with Washington's. It can be done like this: from django.db.models import IntegerField when_list = [When(city=w['city'], then=w['weight']) for w in weights] case_params = {'default': 1, 'output_field': IntegerField()} Model.objects.values('date') \ .annotate( weighted_car_crashes=Sum( F('car_crashes') * Case(*when_list, **case_params) )) However, this generates very slow SQL code, especially as more properties and a larger array are introduced. Another solution which is way faster but still sub-optimal is using pandas to : aggregated = false for weight in weights: ag = Model.values("date") \ .annotate(car_crashes=Sum('car_crashes')) \ .values("date", "car_crashes") if aggregated is False: aggregated = ag else: aggregated = aggregated.union(ag) aggregated = pd.DataFrame(aggregated) if len(weights) > 1: aggregated = aggregated.groupby("date", as_index=False).sum(level=[1]) This is faster, but still not as fast as what happens if, before calling pandas, I take the aggregated.query string and wrap it with a few lines of SQL. SELECT "date", sum("car_crashes") FROM ( // String from Python str(aggregated.query) ) This works perfectly when pasted into my database SQL. I could do this in Python/Django using .raw() but the documentation says to ask here before using .raw() as mostly anything could be acomplished with the ORM. Yet, I don't see how. Once I call .union on 2 querysets, I cannot aggregate further. aggregated.union(ag).annotate(cc=Sum('car_crashes')) gives Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate Is this possible to do with the Django ORM or should I use .raw()? -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3e3e80a1-29e4-47bd-8688-f826063d2b02%40googlegroups.com.