Re: Make sure QuerySet.get() does not fetch more rows than it absolutely needs

Anssi Kääriäinen Mon, 03 Jun 2013 23:48:39 -0700

On 4 kesä, 00:22, Shai Berger <[email protected]> wrote:
> On Monday 03 June 2013, Patryk Zawadzki wrote:
>
> > Here's the ticket:
>
> >https://code.djangoproject.com/ticket/6785
>
> > tl;dr: Calling .get() on a badly filtered queryset can result in lots
> > of database rows being fetched and wrapped in Python objects for no
> > gain.
>
> tl;dr: There's a general, valuable optimization to be made here, but it should
> be implemented at a lower level.
>
> I have raised a related issue about a year and a half ago[0], thinking
> (mistakenly) that it was mostly an Oracle issue. Ian Kelly had pointed me in
> the right direction, but then life happened... The problem is that for all
> single-row queries except aggregates, Django uses the same strategy as used in
> get (and which you kept using in your patch): limit the query (usually)
> properly, then fetch until no more records are retrieved (fetching in done in
> chunks using fetchmany(), and not using fetchall()).
>
> For single-row queries, this means two fetches; an unnecessary network
> roundtrip, for each such query, unless the backend or underlying driver take
> care to prevent it. On Oracle, at least, nobody does.
>
> What we should do instead, in my opinion, is stop fetching as soon as a chunk
> arrives that isn't full. This alone should reduce the number of network
> accesses significantly on most Django projects.
>
> While we're doing that, we could also define a new mode of sql execution (see
> SQLCompiler.results_iter(), in django.db.models.sql.compiler): Next to the
> current SINGLE (which uses fetchone()) and MULTI (the chunked loop described
> above) we could have SINGLE_VERIFY, which always fetches a single chunk of
> size 2. Make get() use that, and #6785 is fixed, while improving performance 
> --
> rather than hurting it on correct usage, like the proposed patch[1] does.
>
> Assuming nobody raises objections, I intend to propose a patch along these
> lines sometime in the coming weeks (it's one of a few things on my to-do
> list), but I wouldn't mind at all if someone beat me to it.
>
> Hope this helps,
>         Shai.
>
> [0]https://groups.google.com/d/msg/django-developers/SQ0Pltt_f0M/3ccQn0a...
> [1]https://github.com/django/django/pull/1139


Note that how queryset iteration happens has been changed[1]. Except
for .iterator() call the rows in the queryset are converted to objects
immediately. This isn't a big change for other core backends than
Oracle (they always fetched all rows in one go), but for Oracle users
this could mean changes in amount of rows fetched from DB.

So, with this in mind you could change the compiler's iterator to use
fetchall() except when .iterator() is used.

As for .get() - I don't find the number of duplicates in the error
message that useful. If you got more than one object you failed. How
badly you failed isn't interesting. I can't think of any case where
the amount of objects you got actually matters. If somebody feels
strongly about this, then we could limit the amount of fetched objects
to some smallish amount (21 for example). If you actually get 21
objects back then you say "it returned more than 20" in the error
message.

 - Anssi

[1]https://code.djangoproject.com/ticket/18702

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Make sure QuerySet.get() does not fetch more rows than it absolutely needs

Reply via email to