I have a performance problem with Django + Django REST Framework. The 
problem is that it takes Django very, very long time to handle SQL query 
response + DRF serialization take quite some time also. It happens when 
there are ManyToMany or OneToMany relations and nested objects. Sounds like 
"N + 1" problem, but that's not the case. I have 10k to 50k items with 
related items and I try to fetch and serialize them for REST API. Request 
takes from 20 to 60 seconds and currently I have no idea what is causing 
the slowness.

Note: I asked the same question in Stack Overflow, but haven't found the 
answer from there - 
https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl


* N+1 Problem * 

There are not too many queries and queries themselves are fast (queries at 
the end of this post). I'm using `prefetch_related` to limit number of 
queries and what I'm seeing from DB queries everything is looking Okay..ish 
(?). I get one query for each `prefetch_related` property + the original 
query for serialized objects. There are lots and lots of IDs included in 
`prefetch_related` queries, but I guess that is inevitable - as many IDs as 
there are original items. 

To test SQL queries + data transfer, I run same queries with psql from one 
of my EC2 instances to RDS DB with same DB data and wrote the data to file. 
Data transfer plus file write on top of that and it is totally between 100 
to 500 ms for bigger SQL queries for different data sets. File write is 
extra, but I wanted to ensure that I get all the data I expect.

I tested timing from EC2 instance with command from like:

    time psql -f fat-query.psql --host=rds_host --port=5432 --username=user 
--dbname=dbname > output.txt


* Profiling * 

When profiling as shown here 
https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
I get results that DB lookups take most of the time while serializing is 
not too fast either. As an example I have about 12k "Items" in the my local 
PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and 
majority of "Items" have also 1-2 "Photos". Fetching and serializing that 
data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for 
deployment and the timing is about the same there. On larger "Item" sets 
serialization time is increasing more than DB lookup time, but DB lookups 
always take most of the time. With 40k items you'll start to reach 1 min 
execution time and different timeouts from Nginx and other parts of the 
stack.

Example with 12k items (models, serializers and queries below)

    Database lookup               | 14.0292s
    Serialization                 | 6.0076s
    Django request/response       | 0.3488s
    API view                      | 0.2170s
    Response rendering            | 1.1092s

If I leave Photos and Events out the result is

    Database lookup               | 1.2447s
    Serialization                 | 3.9668s
    Django request/response       | 0.2435s
    API view                      | 0.1320s
    Response rendering            | 0.8495s


* What might cause the slowness? *

So, the related fields are taking most of the time (many=True). The 
profiling I used for testing is making `list` out of queryset before 
serializing. Therefore lazy queries are executed before serialization. If I 
don't do that, it doesn't change the overall results, but DB lookups are 
evaluated when serializing with about the same amount of time.

Now the problem for me is that all queries that are done are fast if 
executed manually. So, I believe SQL queries are fast, but DB lookups from 
Django's point of view are very slow. What am I missing here? Or how should 
I continue investigations? It feels like now it requires serious effort 
from Django to convert SQL query results to Django model instances. That 
would imply that there's something wrong with my models, right?


At the end, I could turn to caching, but I would assume that handling < 
100k objects should not be an issue for Django if done correctly.

----------

Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3

Simplified versions of models, views and serializers:

    class List(models.Model):
        ... CharFields, DateTimeFields, ForeignKeys etc. ...
    
    class Item(models.Model):
        list = models.ForeignKey(List, on_delete=models.CASCADE, 
db_index=True, null=True, related_name='items')
        deleted_at = models.DateTimeField(db_index=True, blank=True, 
null=True, default=None)
        created_by = models.ForeignKey(User, blank=False)
        project = models.ForeignKey('projects.Project', 
on_delete=models.CASCADE)
        ... other CharFields, DateTimeFields, ForeignKeys etc. ...
    
    class Event(models.Model):
        item = models.ForeignKey(Item, on_delete=models.CASCADE, 
db_index=True, null=True, related_name='events')
        created_by = models.ForeignKey(User, blank=False)
        deleted_at = models.DateTimeField(db_index=True, blank=True, 
null=True, default=None)
        ... other CharFields, DateTimeFields, ForeignKeys etc. ...
    
    class Photo(models.Model):
        item = models.ForeignKey(Item, on_delete=models.CASCADE, 
db_index=True, null=True, related_name='photos')
        created_by = models.ForeignKey(User, blank=False)
        deleted_at = models.DateTimeField(db_index=True, blank=True, 
null=True, default=None)
        ... other CharFields, DateTimeFields, ForeignKeys etc. ...
    
    
    class PhotoSerializer(serializers.ModelSerializer):
        ... other CharFields, DateTimeFields, ForeignKeys etc. ...
    
    class EventSerializer(serializers.ModelSerializer):
        createdBy = PrimaryKeyRelatedStringField(source='created_by', 
read_only=True)
        createdByFullName = 
serializers.CharField(source='created_by.get_full_name', read_only=True)
        ... other CharFields, DateTimeFields, ForeignKeys etc. ...
    
    class ItemSerializer(serializers.ModelSerializer):
        listName = serializers.CharField(source='list.name', read_only=True)
        createdBy = PrimaryKeyRelatedStringField(source='created_by', 
read_only=True)
        createdByFullName = 
serializers.CharField(source='created_by.get_full_name', read_only=True)
        photos = PhotoSerializer(many=True, read_only=True)
        events = EventSerializer(many=True, required=False, 
allow_null=True, queryset=Event.objects.all())
        ... other fields ...
    
    
    class ItemListAPIView(ListAPIView):
        model = Item
        serializer_class = ItemSerializer
    
        def get_queryset(self):
            return 
Item.objects.all().filter(project_id=...).filter(deleted_at__isnull=True).prefetch_related(
                'created_by', # ID of user who created item
                'photos', # Photo properties
                'event__created_by', # Full name of the person who created 
the event
                'list', # Name of the related list
            )

Example queries from tests with 14s DB lookup result:

    django.db.backends: (0.196) SELECT "todo_item"."version", ... 
everything ... FROM "todo_item" WHERE ("todo_item"."project_id" = 1 AND 
"todo_item"."deleted_at" IS NULL) ORDER BY "todo_item"."created_at" DESC;
    django.db.backends: (0.001) SELECT "auth_user"."id", ... everything ... 
FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ... some IDs ...);
    django.db.backends: (0.148) SELECT "photos_photo"."version", ... 
everything ... FROM "photos_photo" WHERE ("photos_photo"."deleted_at" IS 
NULL AND "photos_photo"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY 
"photos_photo"."created_at" DESC;
    django.db.backends: (0.078) SELECT "events_event"."created_at", ... 
everything ... FROM "events_event" WHERE ("events_event"."deleted_at" IS 
NULL AND "events_event"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY 
"events_event"."created_at" DESC, "events_event"."created_at" DESC; 
    django.db.backends: (0.157) SELECT "todo_list"."created_at", ... FROM 
"todo_list" WHERE "todo_list"."id" IN (1, 2, ... lots of IDs... N)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/989dd9f6-508a-4dd7-a085-c30557204f41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to