I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization. If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour.

Concerning implementing it as an additional QuerySet method like `auto_prefetch()` - I'm not sure what I think, I feel like it could get icky (i.e. increase our technical debt), due to the way it couples things together. I can't imagine ever wanting to use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[email protected] <mailto:[email protected]>> wrote:

    I'd like to discuss automatic prefetching in querysets.
    Specifically automatically doing prefetch_related where needed
    without the user having to request it.

    For context consider these three snippets using the Question &
    Choice models from the tutorial
    <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
when
    there are 100 questions each with 5 choices for a total of 500
    choices.

    Default
    |
    forchoice inChoice.objects.all():
    print(choice.question.question_text,':',choice.choice_text)
    |
    501 db queries, fetches 500 choice rows and 500 question rows from
    the DB

    Prefetch_related
    |
    forchoice inChoice.objects.prefetch_related('question'):
    print(choice.question.question_text,':',choice.choice_text)
    |
    2 db queries, fetches 500 choice rows and 100 question rows from
    the DB

    Select_related
    |
    forchoice inChoice.objects.select_related('question'):
    print(choice.question.question_text,':',choice.choice_text)
    |
    1 db query, fetches 500 choice rows and 500 question rows from the DB

    I've included select_related for completeness, I'm not going to
    propose changing anything about it's use. There are places where
    it is the best choice and in those places it will still be up to
    the user to request it. I will note that anywhere select_related
    is optimal prefetch_related is still better than the default and
    leave it at that.

    The 'Default' example above is a classic example of the N+1 query
    problem, a problem that is widespread in Django apps.
    This pattern of queries is what new users produce because they
    don't know enough about the database and / or ORM to do otherwise.
    Experieced users will also often produce this because it's not
    always obvious what fields will and won't be used and subsequently
    what should be prefetched.
    Additionally that list will change over time. A small change to a
    template to display an extra field can result in a denial of
    service on your DB due to a missing prefetch.
    Identifying missing prefetches is fiddly, time consuming and error
    prone. Tools like django-perf-rec
    <https://github.com/YPlan/django-perf-rec> (which I was involved
    in creating) and nplusone <https://github.com/jmcarp/nplusone>
    exist in part to flag missing prefetches introduced by changed code.
    Finally libraries like Django Rest Framework and the Admin will
    also produce queries like this because it's very difficult for
    them to know what needs prefetching without being explicitly told
    by an experienced user.

    As hinted at the top I'd like to propose changing Django so the
    default code behaves like the prefetch_related code.
    Longer term I think this should be the default behaviour but
    obviously it needs to be proved first so for now I'd suggest a new
    queryset function that enables this behaviour.

    I have a proof of concept of this mechanism that I've used
    successfully in production. I'm not posting it yet because I'd
    like to focus on desired behavior rather than implementation
    details. But in summary, what it does is when accessing a missing
    field on a model, rather than fetching it just for that instance,
    it runs a prefetch_related query to fetch it for all peer
    instances that were fetched in the same queryset. So in the
    example above it prefetches all Questions in one query.

    This might seem like a risky thing to do but I'd argue that it
    really isn't.
    The only time this isn't superior to the default case is when you
    are post filtering the queryset results in Python.
    Even in that case it's only inferior if you started with a large
    number of results, filtered basically all of them and the code is
    structured so that the filtered ones aren't garbage collected.
    To cover this rare case the automatic prefetching can easily be
    disabled on a per queryset or per object basis. Leaving us with a
    rare downside that can easily be manually resolved in exchange for
    a significant general improvement.

    In practice this thing is almost magical to work with. Unless you
    already have extensive and tightly maintained prefetches
    everywhere you get an immediate boost to virtually everything that
    touches the database, often knocking orders of magnitude off page
    load times.

    If an agreement can be reached on pursuing this then I'm happy to
    put in the work to productize the proof of concept.

-- You received this message because you are subscribed to the Google
    Groups "Django developers (Contributions to Django itself)" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    To post to this group, send email to
    [email protected]
    <mailto:[email protected]>.
    Visit this group at
    https://groups.google.com/group/django-developers
    <https://groups.google.com/group/django-developers>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com
    
<https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&utm_source=footer>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To post to this group, send email to [email protected] <mailto:[email protected]>.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com <https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to