Hi Anshuman,

On Sat, Oct 18, 2014 at 2:05 PM, Anshuman Aggarwal <
[email protected]> wrote:

> Please see this enhancement request:
> https://code.djangoproject.com/ticket/23646
>
> Unlike what Russ has suggested, I'm pretty sure that a single UPDATE query
> with a large number (Ks/Ms) of updates will be significantly faster than
> doing multiple SQL UPDATE queries. If more people on the list feel this is
> not going to be the case, I will happily run a test against Postgresql and
> confirm the results either way.
>

Data trumps everything. I'll immediately stand down from my objections in
the ticket if you can demonstrate there's a significant performance benefit.

One caveat to keep in mind in your tests - make sure you take into account
transactions. Multiple single statements in PostgreSQL will be interpreted
as multiple individual transactions. In order to get a fair performance
comparison, you should be comparing a "single statement" update against
multiple updates *in the same transaction*. Of course, having performance
data for all three approaches (single statement, multiple statement inside
transaction, multiple statement outside transaction) can't hurt. Data for
other databases would also be nice.


> Assuming however that the performance benefit is significant, should we
> look at contributing a patch? If so, what would be the API for the same?
>
> Use case:
> For each row in a table we send requests to a server. We get individual
> updates from the server informing us of the status of the request. Each
> update corresponds to a row in a table. We want to store the datetime of
> the update but do not wish to hit the database everytime (we have seen
> performance impact since the table is huge). We memcache to batch the
> updates and do a single Django ORM .update() call which works well but
> updates all rows to a common datetime. Ideally, we wish to update each row
> with its own datetime of receipt of request.
>
> Also, if any django/postgres experts can advise of a way to do large
> number of updates concurrently on a table or a better design for this ,
> would like to hear suggestions but can move that over to the Django Users
> mailing list.
>

Logging to the database is almost always the Wrong Thing To Do, unless
you're seeing very low traffic, or you've got a separate database for
logging. Depending on what you're doing with the logged data, there are
much better options for logging, including:

 * persistent in-memory stores (like Redis)
 * analytics data stores (like ElasticSearch)
 * syslog with good analysis tools

What will be the right approach for you will depend a little on what you're
hoping to do with the data you're gathering.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAJxq84_NMYNi4BtETo0HXHy_O80Mwf-jqWk3wx%3DF1oULSVU_DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to