There is some overhead by using ORM: this is inevitable.
On the other hand, you didn't posted your ORM calls, and the queries are
incomplete.
If you think your queries are optimized (as I understand they are from
your description), an workaround is to avoid using the ORM for those
specific queries. Or review your Models, they might not be optimized for
what you want. Can you try to reduce the number of queries to a smaller
number by creating more complex queries?
Another possibility is getting a new Model recovering information from a
view/stored procedure on the database... you see, you want to get 10k to
50k items... that's a lot (to serialize too).
Em 11/06/2017 07:06, Miika Huusko escreveu:
I have a performance problem with Django + Django REST Framework. The
problem is that it takes Django very, very long time to handle SQL
query response + DRF serialization take quite some time also. It
happens when there are ManyToMany or OneToMany relations and nested
objects. Sounds like "N + 1" problem, but that's not the case. I have
10k to 50k items with related items and I try to fetch and serialize
them for REST API. Request takes from 20 to 60 seconds and currently I
have no idea what is causing the slowness.
Note: I asked the same question in Stack Overflow, but haven't found
the answer from there -
https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl
* N+1 Problem *
There are not too many queries and queries themselves are fast
(queries at the end of this post). I'm using `prefetch_related` to
limit number of queries and what I'm seeing from DB queries everything
is looking Okay..ish (?). I get one query for each `prefetch_related`
property + the original query for serialized objects. There are lots
and lots of IDs included in `prefetch_related` queries, but I guess
that is inevitable - as many IDs as there are original items.
To test SQL queries + data transfer, I run same queries with psql from
one of my EC2 instances to RDS DB with same DB data and wrote the data
to file. Data transfer plus file write on top of that and it is
totally between 100 to 500 ms for bigger SQL queries for different
data sets. File write is extra, but I wanted to ensure that I get all
the data I expect.
I tested timing from EC2 instance with command from like:
time psql -f fat-query.psql --host=rds_host --port=5432
--username=user --dbname=dbname > output.txt
* Profiling *
When profiling as shown here
https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/
I get results that DB lookups take most of the time while serializing
is not too fast either. As an example I have about 12k "Items" in the
my local PostgreSQL database for one "Project". All "Items" have 1-5
"Events" and majority of "Items" have also 1-2 "Photos". Fetching and
serializing that data takes around 22 seconds on my laptop. I'm using
AWS EC2 + RDS for deployment and the timing is about the same there.
On larger "Item" sets serialization time is increasing more than DB
lookup time, but DB lookups always take most of the time. With 40k
items you'll start to reach 1 min execution time and different
timeouts from Nginx and other parts of the stack.
Example with 12k items (models, serializers and queries below)
Database lookup | 14.0292s
Serialization | 6.0076s
Django request/response | 0.3488s
API view | 0.2170s
Response rendering | 1.1092s
If I leave Photos and Events out the result is
Database lookup | 1.2447s
Serialization | 3.9668s
Django request/response | 0.2435s
API view | 0.1320s
Response rendering | 0.8495s
* What might cause the slowness? *
So, the related fields are taking most of the time (many=True). The
profiling I used for testing is making `list` out of queryset before
serializing. Therefore lazy queries are executed before serialization.
If I don't do that, it doesn't change the overall results, but DB
lookups are evaluated when serializing with about the same amount of time.
Now the problem for me is that all queries that are done are fast if
executed manually. So, I believe SQL queries are fast, but DB lookups
from Django's point of view are very slow. What am I missing here? Or
how should I continue investigations? It feels like now it requires
serious effort from Django to convert SQL query results to Django
model instances. That would imply that there's something wrong with my
models, right?
At the end, I could turn to caching, but I would assume that handling
< 100k objects should not be an issue for Django if done correctly.
----------
Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3
Simplified versions of models, views and serializers:
class List(models.Model):
... CharFields, DateTimeFields, ForeignKeys etc. ...
class Item(models.Model):
list = models.ForeignKey(List, on_delete=models.CASCADE,
db_index=True, null=True, related_name='items')
deleted_at = models.DateTimeField(db_index=True, blank=True,
null=True, default=None)
created_by = models.ForeignKey(User, blank=False)
project = models.ForeignKey('projects.Project',
on_delete=models.CASCADE)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class Event(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE,
db_index=True, null=True, related_name='events')
created_by = models.ForeignKey(User, blank=False)
deleted_at = models.DateTimeField(db_index=True, blank=True,
null=True, default=None)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class Photo(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE,
db_index=True, null=True, related_name='photos')
created_by = models.ForeignKey(User, blank=False)
deleted_at = models.DateTimeField(db_index=True, blank=True,
null=True, default=None)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class PhotoSerializer(serializers.ModelSerializer):
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class EventSerializer(serializers.ModelSerializer):
createdBy = PrimaryKeyRelatedStringField(source='created_by',
read_only=True)
createdByFullName =
serializers.CharField(source='created_by.get_full_name', read_only=True)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class ItemSerializer(serializers.ModelSerializer):
listName = serializers.CharField(source='list.name',
read_only=True)
createdBy = PrimaryKeyRelatedStringField(source='created_by',
read_only=True)
createdByFullName =
serializers.CharField(source='created_by.get_full_name', read_only=True)
photos = PhotoSerializer(many=True, read_only=True)
events = EventSerializer(many=True, required=False,
allow_null=True, queryset=Event.objects.all())
... other fields ...
class ItemListAPIView(ListAPIView):
model = Item
serializer_class = ItemSerializer
def get_queryset(self):
return
Item.objects.all().filter(project_id=...).filter(deleted_at__isnull=True).prefetch_related(
'created_by',# ID of user who created item
'photos',# Photo properties
'event__created_by',# Full name of the person who
created the event
'list',# Name of the related list
)
Example queries from tests with 14s DB lookup result:
django.db.backends: (0.196) SELECT "todo_item"."version", ...
everything ... FROM "todo_item" WHERE ("todo_item"."project_id" = 1
AND "todo_item"."deleted_at" IS NULL) ORDER BY
"todo_item"."created_at" DESC;
django.db.backends: (0.001) SELECT "auth_user"."id", ...
everything ... FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ...
some IDs ...);
django.db.backends: (0.148) SELECT "photos_photo"."version", ...
everything ... FROM "photos_photo" WHERE ("photos_photo"."deleted_at"
IS NULL AND "photos_photo"."item_id" IN (1, 2, ... lots of IDs... N))
ORDER BY "photos_photo"."created_at" DESC;
django.db.backends: (0.078) SELECT "events_event"."created_at",
... everything ... FROM "events_event" WHERE
("events_event"."deleted_at" IS NULL AND "events_event"."item_id" IN
(1, 2, ... lots of IDs... N)) ORDER BY "events_event"."created_at"
DESC, "events_event"."created_at" DESC;
django.db.backends: (0.157) SELECT "todo_list"."created_at", ...
FROM "todo_list" WHERE "todo_list"."id" IN (1, 2, ... lots of IDs... N)
--
You received this message because you are subscribed to the Google
Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-users+unsubscr...@googlegroups.com
<mailto:django-users+unsubscr...@googlegroups.com>.
To post to this group, send email to django-users@googlegroups.com
<mailto:django-users@googlegroups.com>.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/989dd9f6-508a-4dd7-a085-c30557204f41%40googlegroups.com
<https://groups.google.com/d/msgid/django-users/989dd9f6-508a-4dd7-a085-c30557204f41%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Django
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/6d19bbf1-57aa-bde9-f4f6-3836f941df47%40gmail.com.
For more options, visit https://groups.google.com/d/optout.