Hmm, that's odd, the grouping (map/reduce/filter/lambda) is extremely quick for me (even on a heavy data set).
My guess is that grouping would need to be done on a combination of field name+value, and would need to allow the user to specify what bulk to use (to prevent MemoryError exception - or find some way to reduce the bulk when MemoryError is encountered). If you end up introducing it into 3.0, I'll definitely be interested in taking a look at the code :) Cal On Wed, Jun 22, 2011 at 3:17 PM, Thomas Weholt <thomas.weh...@gmail.com>wrote: > On Wed, Jun 22, 2011 at 3:52 PM, Cal Leeming [Simplicity Media Ltd] > <cal.leem...@simplicitymedialtd.co.uk> wrote: > > Sorry, let me explain a little better. > > (51.98s) Found 49659 objs (match: 16563) (db writes: 51180) (range: > > 72500921 ~ 72550921), (avg 16.9 mins/million) - [('is_checked', > > 49659), ('is_image_blocked', 0), ('has_link', 1517), ('is_spam', 4)] > > map(lambda x: (x[0], len(x[1])), _obj_incs.iteritems()) = [('is_checked', > > 49659), ('is_image_blocked', 0), ('has_link', 1517), ('is_spam', 4)] > > In the above example, it has found 49659 rows which need 'is_checked' > > changing to the value '1' (same principle applied to the other 3), giving > a > > total of 51,130 database writes, split into 4 queries. > > Those 4 fields have the IDs assigned to them: > > if _f == 'block_images': > > > > _obj_incs.get('is_image_blocked').append(_hit_id) > > if _parent_id: > > > > _obj_incs.get('is_image_blocked').append(_parent_id) > > Then I loop through those fields, and do an update() using the necessary > > IDs: > > # now apply the obj changes in bulk (massive speed > > improvements) > > for _key, _value in _obj_incs.iteritems(): > > # update the child object > > Post.objects.filter( > > id__in = _value > > ).update( > > **{ > > _key : 1 > > } > > ) > > So in simple terms, we're not doing 51 thousand update queries, instead > > we're grouping them into bulk queries based on the row to be updated. It > > doesn't yet to grouping based on key AND value, simply because we didn't > > need it at the time, but if we release the code for public use, > > we'd definitely add this in. > > Hope this makes sense, let me know if I didn't explain it very well lol. > > Cal > > Actually, I started working on something similar, but tried to find > sets of fields, instead of just updating one field pr update, but > didn't finish it because the actual grouping of the fields seem to > take alot of time/cpu/memory. Perhaps if I focused on updating one > field at the time it would be simpler. Might look at it again for DSE > 3.0 ;-) > > Thomas > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-users@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.