Thanks for response! Yep, timing with "time" is not the best way to go to compare SQL query times. The reason I added that "time" test is that in Stack Overflow I was asked to confirm that data transfer is not factor here. I timed SQL queries with "EXPLAIN ANALYZE" and "\timing" first. That don't take data transfer into account, so, "time" as quick test that data transfer is not a problem.
About timing: as an example I can reduce the problem to serializing only Items and related Photos. It results only two queries. For example for a dataset of about 40k items results: django.db.backends: (1.252) SELECT "todo_item"."version", *... all item properties ...* FROM "todo_item" WHERE ("todo_item"."project_id" = '...' AND "todo_item"."deleted_at" IS NULL); django.db.backends: (0.883) SELECT "photos_photo"."version", *... all item properties ...* FROM "photos_photo" WHERE "photos_photo"."note_id" IN (349527, 349528, 349529, *... and rest of the 40k IDs ...* ); Quite simple quries. Timing shown in django.db.backends logs is what those queries take when executed manually with psql (1252 ms + 883 ms). That results simple profiling info: Database lookup | 20.4447s Serialization | 3.3821s Django request/response | 0.3419s API view | 0.1988s Response rendering | 0.4591s That's only from a single query and query times vary of course. Still, the difference between how long it takes query data from DB and how long Django process it is just that huge. The part I don't understand is that it takes about 20 seconds to run list(self.get_queryset()) while those two queries take about 2 seconds in SQL. There is some serious effort and time put there by Django. Those two queries are only queries that are run during list(self.get_queryset()) according to django.db.backend logs. "list" is there to force query execution to separate DB lookup time and serialization time. Adding a new Model recovering information from a view/stored procedure on the database is a good idea. Of course, I would like to first understand what might be wrong in current models to not make same mistakes again. Is there something one should consider when making related items like Photos and Events related to Items? That use case is quite simple and still result Django to use 18 seconds to process SQL query response. There is lot of data of course, but I have thought that returning 50k objects should not be a problem for Python / Django even though ORM always adds some overhead. On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote: > > I have a performance problem with Django + Django REST Framework. The > problem is that it takes Django very, very long time to handle SQL query > response + DRF serialization take quite some time also. It happens when > there are ManyToMany or OneToMany relations and nested objects. Sounds like > "N + 1" problem, but that's not the case. I have 10k to 50k items with > related items and I try to fetch and serialize them for REST API. Request > takes from 20 to 60 seconds and currently I have no idea what is causing > the slowness. > > Note: I asked the same question in Stack Overflow, but haven't found the > answer from there - > https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl > > > * N+1 Problem * > > There are not too many queries and queries themselves are fast (queries at > the end of this post). I'm using `prefetch_related` to limit number of > queries and what I'm seeing from DB queries everything is looking Okay..ish > (?). I get one query for each `prefetch_related` property + the original > query for serialized objects. There are lots and lots of IDs included in > `prefetch_related` queries, but I guess that is inevitable - as many IDs as > there are original items. > > To test SQL queries + data transfer, I run same queries with psql from one > of my EC2 instances to RDS DB with same DB data and wrote the data to file. > Data transfer plus file write on top of that and it is totally between 100 > to 500 ms for bigger SQL queries for different data sets. File write is > extra, but I wanted to ensure that I get all the data I expect. > > I tested timing from EC2 instance with command from like: > > time psql -f fat-query.psql --host=rds_host --port=5432 > --username=user --dbname=dbname > output.txt > > > * Profiling * > > When profiling as shown here > https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ > I get results that DB lookups take most of the time while serializing is > not too fast either. As an example I have about 12k "Items" in the my local > PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and > majority of "Items" have also 1-2 "Photos". Fetching and serializing that > data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for > deployment and the timing is about the same there. On larger "Item" sets > serialization time is increasing more than DB lookup time, but DB lookups > always take most of the time. With 40k items you'll start to reach 1 min > execution time and different timeouts from Nginx and other parts of the > stack. > > Example with 12k items (models, serializers and queries below) > > Database lookup | 14.0292s > Serialization | 6.0076s > Django request/response | 0.3488s > API view | 0.2170s > Response rendering | 1.1092s > > If I leave Photos and Events out the result is > > Database lookup | 1.2447s > Serialization | 3.9668s > Django request/response | 0.2435s > API view | 0.1320s > Response rendering | 0.8495s > > > * What might cause the slowness? * > > So, the related fields are taking most of the time (many=True). The > profiling I used for testing is making `list` out of queryset before > serializing. Therefore lazy queries are executed before serialization. If I > don't do that, it doesn't change the overall results, but DB lookups are > evaluated when serializing with about the same amount of time. > > Now the problem for me is that all queries that are done are fast if > executed manually. So, I believe SQL queries are fast, but DB lookups from > Django's point of view are very slow. What am I missing here? Or how should > I continue investigations? It feels like now it requires serious effort > from Django to convert SQL query results to Django model instances. That > would imply that there's something wrong with my models, right? > > > At the end, I could turn to caching, but I would assume that handling < > 100k objects should not be an issue for Django if done correctly. > > ---------- > > Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3 > > Simplified versions of models, views and serializers: > > class List(models.Model): > ... CharFields, DateTimeFields, ForeignKeys etc. ... > > class Item(models.Model): > list = models.ForeignKey(List, on_delete=models.CASCADE, > db_index=True, null=True, related_name='items') > deleted_at = models.DateTimeField(db_index=True, blank=True, > null=True, default=None) > created_by = models.ForeignKey(User, blank=False) > project = models.ForeignKey('projects.Project', > on_delete=models.CASCADE) > ... other CharFields, DateTimeFields, ForeignKeys etc. ... > > class Event(models.Model): > item = models.ForeignKey(Item, on_delete=models.CASCADE, > db_index=True, null=True, related_name='events') > created_by = models.ForeignKey(User, blank=False) > deleted_at = models.DateTimeField(db_index=True, blank=True, > null=True, default=None) > ... other CharFields, DateTimeFields, ForeignKeys etc. ... > > class Photo(models.Model): > item = models.ForeignKey(Item, on_delete=models.CASCADE, > db_index=True, null=True, related_name='photos') > created_by = models.ForeignKey(User, blank=False) > deleted_at = models.DateTimeField(db_index=True, blank=True, > null=True, default=None) > ... other CharFields, DateTimeFields, ForeignKeys etc. ... > > > class PhotoSerializer(serializers.ModelSerializer): > ... other CharFields, DateTimeFields, ForeignKeys etc. ... > > class EventSerializer(serializers.ModelSerializer): > createdBy = PrimaryKeyRelatedStringField(source='created_by', > read_only=True) > createdByFullName = > serializers.CharField(source='created_by.get_full_name', read_only=True) > ... other CharFields, DateTimeFields, ForeignKeys etc. ... > > class ItemSerializer(serializers.ModelSerializer): > listName = serializers.CharField(source='list.name', > read_only=True) > createdBy = PrimaryKeyRelatedStringField(source='created_by', > read_only=True) > createdByFullName = > serializers.CharField(source='created_by.get_full_name', read_only=True) > photos = PhotoSerializer(many=True, read_only=True) > events = EventSerializer(many=True, required=False, > allow_null=True, queryset=Event.objects.all()) > ... other fields ... > > > class ItemListAPIView(ListAPIView): > model = Item > serializer_class = ItemSerializer > > def get_queryset(self): > return > Item.objects.all().filter(project_id=...).filter(deleted_at__isnull=True).prefetch_related( > 'created_by', # ID of user who created item > 'photos', # Photo properties > 'event__created_by', # Full name of the person who > created the event > 'list', # Name of the related list > ) > > Example queries from tests with 14s DB lookup result: > > django.db.backends: (0.196) SELECT "todo_item"."version", ... > everything ... FROM "todo_item" WHERE ("todo_item"."project_id" = 1 AND > "todo_item"."deleted_at" IS NULL) ORDER BY "todo_item"."created_at" DESC; > django.db.backends: (0.001) SELECT "auth_user"."id", ... everything > ... FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ... some IDs ...); > django.db.backends: (0.148) SELECT "photos_photo"."version", ... > everything ... FROM "photos_photo" WHERE ("photos_photo"."deleted_at" IS > NULL AND "photos_photo"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY > "photos_photo"."created_at" DESC; > django.db.backends: (0.078) SELECT "events_event"."created_at", ... > everything ... FROM "events_event" WHERE ("events_event"."deleted_at" IS > NULL AND "events_event"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY > "events_event"."created_at" DESC, "events_event"."created_at" DESC; > django.db.backends: (0.157) SELECT "todo_list"."created_at", ... FROM > "todo_list" WHERE "todo_list"."id" IN (1, 2, ... lots of IDs... N) > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/077f7f7c-7bab-4266-ad1d-60d2ad8167b0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.