Hi Michael, Great to see new expressions being used already! Speaking specifically about the Case structure, I think it looks nice. I was planning on writing a few functions (https://code.djangoproject.com/ticket/23753) which included Case/When of some description.
My idea of a Case API was: Model.objects.annotate(field=If( Q(), TrueExpression(), FalseExpression(), output_field=Field )) Where TrueExpression and FalseExpression could be any expression, like Value(1) or another If(). I think I prefer your list of 2-tuples with a default though, as it looks cleaner, and should be easier for people to write. You could maybe avoid asking the user to set an output_field by doing it yourself in the `resolve_expression`. There's also probably an issue where we should validate that all the parts of the case expression will resolve to the same type. For an update, it's easier - since you know the type that should be returned. Using it in an annotate clause gives no such information. I'm not sure that we can distinguish between an update or annotate call to be able to do the extra work on behalf of the user though. Case expressions, in my opinion, are wanted in core. It solves the conditional aggregates problem, conditional select, and now you've shown conditional updates. You also get around introducing another queryset method (bulk_update) which I think would be appreciated. Nice work! On Monday, 17 November 2014 23:54:26 UTC+11, [email protected] wrote: > > I've been working on a bulk_update method for a project to allow saving > many instances of a Model in a single call/query, an analog to the existing > bulk_create method. > There seems to be some interest in this as evidenced by a library > django-bulk-update <https://github.com/aykut/django-bulk-update> (which > isn't working for me in tests with python3 and SQLite) and a recent ticket > <https://code.djangoproject.com/ticket/23646>. > > The ticket was closed as wontfix with an invitation for discussion, > although the author doesn't seem to have pursued it. However, Shai Berger > did comment positively on the ticket and proposed a different API, similar > to the one in django-bulk-update and my own prototype: > > Book.objects.update_many(books, 'price') > > My own attempt was on Django 1.7 using a subclass of ExpressionNode > passed as a value into UpdateQuery.add_update_values which allows Django > to generate most of the UPDATE SQL, while the expression subclass was > only responsible for generating an SQL CASE expression. This quickly > generalized into a Case expression class which could be passed as a value > into QuerySet.update. > > But now that Query Expressions have been merged, I've done a quick port > and the Case expression now works with updates, annotations and > aggregates, and is used to power my bulk_update method. > > A simple performance test shows that this works quite well. I tested the > following methods of updating many database rows, using an in memory SQLite > database : > > # loop_save > for o in objects: > o.save(update_fields=['field']) > > # bulk_update > MyModel.objects.bulk_update(objects, update_fields=['field']) > > # many_updates > q = Q(condition__lt=test_value) > MyModel.objects.filter(q).update(field=value1) > MyModel.objects.filter(~q).update(field=value2) > > # case_update > q = Q(condition__lt=test_value) > MyModel.objects.update(field=Case([(q, value1)], default=value2, > > output_field=MyModel._meta.get_field('field'))) > > With 10000 objects in the database, and updating all of them I got these > run times using clock(): > > loop_save: 3.022845626653664 > bulk_update: 0.1785595393377175 > many_updates: 0.009768530320993563 > case_update: 0.009343744353718986 > > We can see that bulk_update outperforms looping by an order of magnitude, > despite working by generating a CASE expression with a WHEN clause > for every object conditioned on its pk. > Using Case in an update has no significant performance improvement, over > running multiple queries. It may arguably allow for more readable code when > using many conditions since the semantics of case are that conditions are > evaluated in order, like a Python if ... elif ... else. Compare this: > > MyModel.objects.filter(Q(condition__lt=test_value1)).update(field=value1) > MyModel.objects.filter(Q(condition__gte=test_value1, > condition__lt=test_value2)).update(field=value2) > MyModel.objects.filter(Q(condition__gte=test_value2, > condition__lt=test_value3)).update(field=value3) > MyModel.objects.filter(Q(condition__gte=test_value3, > condition__lt=test_value4)).update(field=value4) > MyModel.objects.filter(Q(condition__gte=test_value4)).update(field=value5) > > as opposed to: > > q = Q(condition__lt=test_value) > MyModel.objects.update(field=Case([(Q(condition__lt=test_value1), value1), > (Q(condition__lt=test_value2), value2), > (Q(condition__lt=test_value3), value3), > (Q(condition__lt=test_value4), > value4)], default=value5, > > output_field=MyModel._meta.get_field('field'))) > > With only 10 objects in the database, and updating all of them the times > are: > > loop_save: 0.004131221428270882 > bulk_update: 0.0006249582329893796 > many_updates: 0.0006562061446388481 > case_update: 0.00036259952923938313 > > So the bulk_update performance gains are still there, and update using a > Case expression if faster because the SQL generation time dominates in > this example, and only one query needs to be generated and executed. > > Case expressions can also be used in other ways (although these examples > use a subclass SimpleCase, which is described in a little more detail > below, for simpler syntax): > > # annotation > MyModel.objects.annotate(status_text=SimpleCase('status', [('S', > 'Started'), ('R', 'Running'), ('F', 'Finished')], > default='Unknown', > output_field=CharField())) > > # aggregation > MyModel.objects.aggregate(started=Sum(SimpleCase('status', [('S', 1)], > default=0, output_field=IntegerField())), > running=Sum(SimpleCase('status', [('R', 1)], > default=0, output_field=IntegerField())), > finished=Sum(SimpleCase('status', [('F', 1)], > default=0, output_field=IntegerField()))) > > In these examples SimpleCase is a subclass of Case that generates the SQL > simple CASE expression which is an equality test against a field. > > So is this something that would be a worthwhile addition to core? > It could work now as a separate module, but the update example requires > setting output_field. My original Django 1.7 code abused the > prepare_database_save method that SQLUpdateCompiler called on an > ExpressionNode to get the field that the expression would be assigned to > automatically. After the merge of Query Expression SQLUpdateCompiler > calls either resolve_expression or prepare_database_save, so I can no > longer use this. A change in SQLUpdateCompiler would be required with a > new API for settings output_field automatically on ExpressionNodes used > as update values. > > If this were considered for core, I would appreciate input on the API. I > currently have: > > Case(list_of_case_tuples, default, output_field) - A general CASE > expression called a searched case in the SQL spec. list_of_case_tuples is > an iterable of tuples of the form (Q_object, value), and default is the > value for the ELSE clause. This generates SQL like: > > CASE WHEN n > 0 THEN 'positive' WHEN n < 0 THEN 'negative' ELSE 'zero' END > > SimpleCase(fieldname, list_of_simple_case_tuples, default, output_field) > - A simple case expression in the SQL spec. fieldname is a field > identifier like in an F expression, list_of_simple_case_tuples is an > iterable of tuples of the form (condition_value, result_value). This > generates SQL like: > > CASE n WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE 'other' END > > Model.objects.bulk_update(list_of_instances, update_fields) - An analog > to the bulk_create method that save changes in many model instances. list > of instances is an iterable of model instances, update_fields is the same > as the argument to Model.save of the same name. > > Of course I will put up a branch on github if there is interest in this > proposal. Thanks for your time and sorry for the wall of text. > > - Michael > > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d03f6bca-50c4-4262-9e74-415ea5f06ae7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
