Django Real time Notifications
I want to know if django supports any real time notifications/communications in web applications. For instance, if User A makes an update to its model instance, User B should get an instant notification in real time. Does django provide a mechanism for that? -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/f67beae9-a58b-40ec-8ecf-9181f617b089%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Django Real time Notifications
Hi, You should check out django-channels ( https://channels.readthedocs.io/en/stable/) for that. That is a websocket based solution for doing realtime notifications (as long as the users are logged in to the website). Regards, Andréas 2017-06-11 14:00 GMT+02:00 yingi keme : > I want to know if django supports any real time > notifications/communications in web applications. > > For instance, if User A makes an update to its model instance, User B > should get an instant notification in real time. > > Does django provide a mechanism for that? > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to django-users+unsubscr...@googlegroups.com. > To post to this group, send email to django-users@googlegroups.com. > Visit this group at https://groups.google.com/group/django-users. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/django-users/f67beae9-a58b-40ec-8ecf-9181f617b089%40googlegroups.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAK4qSCft1DweuZ5DMJqwHKFNUz0Lm_%2BjUaxuX2k6Mm43shVmxA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Django Real time Notifications
On Sunday 11 June 2017 05:00:19 yingi keme wrote: > I want to know if django supports any real time > notifications/communications in web applications. > > For instance, if User A makes an update to its model instance, User B > should get an instant notification in real time. > > Does django provide a mechanism for that? Yes and no. Django allows you to define signals, one of them being a post_save signal on a Model. It has no mechanism to notify "User B", as it does not know how to notify User B. But, it has views and urls and browsers have Ajax. So you can poll (not realtime). Django Channels (which is a 3rd party package created by one of Django's core developers) supports websockets. Using websockets one can notify in (near) real time. So in order to do what you want, you need to glue signals to channels, or teach Django Channels how to detect a change in your model. -- Melvyn Sopacua -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1912475.GSAzsYGyPs%40devstation. For more options, visit https://groups.google.com/d/optout.
Re: Django Real time Notifications
Thanks Andreas. I will check it out Yingi Kem > On 11 Jun 2017, at 1:08 PM, Andréas Kühne wrote: > > Hi, > > You should check out django-channels > (https://channels.readthedocs.io/en/stable/) for that. That is a websocket > based solution for doing realtime notifications (as long as the users are > logged in to the website). > > Regards, > > Andréas > > 2017-06-11 14:00 GMT+02:00 yingi keme : >> I want to know if django supports any real time notifications/communications >> in web applications. >> >> For instance, if User A makes an update to its model instance, User B should >> get an instant notification in real time. >> >> Does django provide a mechanism for that? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Django users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to django-users+unsubscr...@googlegroups.com. >> To post to this group, send email to django-users@googlegroups.com. >> Visit this group at https://groups.google.com/group/django-users. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/django-users/f67beae9-a58b-40ec-8ecf-9181f617b089%40googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to django-users+unsubscr...@googlegroups.com. > To post to this group, send email to django-users@googlegroups.com. > Visit this group at https://groups.google.com/group/django-users. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/CAK4qSCft1DweuZ5DMJqwHKFNUz0Lm_%2BjUaxuX2k6Mm43shVmxA%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/B58A8F7B-2BB5-45D7-948C-31441E37D195%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Django Real time Notifications
Thanks melvyn. I think you nailed it for me. Thanks again Yingi Kem > On 11 Jun 2017, at 1:18 PM, Melvyn Sopacua wrote: > > On Sunday 11 June 2017 05:00:19 yingi keme wrote: > > I want to know if django supports any real time > > notifications/communications in web applications. > > > > For instance, if User A makes an update to its model instance, User B > > should get an instant notification in real time. > > > > Does django provide a mechanism for that? > > Yes and no. > Django allows you to define signals, one of them being a post_save signal on > a Model. > It has no mechanism to notify "User B", as it does not know how to notify > User B. > > But, it has views and urls and browsers have Ajax. So you can poll (not > realtime). > > Django Channels (which is a 3rd party package created by one of Django's core > developers) supports websockets. Using websockets one can notify in (near) > real time. > > So in order to do what you want, you need to glue signals to channels, or > teach Django Channels how to detect a change in your model. > -- > Melvyn Sopacua > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to django-users+unsubscr...@googlegroups.com. > To post to this group, send email to django-users@googlegroups.com. > Visit this group at https://groups.google.com/group/django-users. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/1912475.GSAzsYGyPs%40devstation. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/95737D4D-E561-422E-8B41-64A3724A1863%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
I have a performance problem with Django + Django REST Framework. The problem is that it takes Django very, very long time to handle SQL query response + DRF serialization take quite some time also. It happens when there are ManyToMany or OneToMany relations and nested objects. Sounds like "N + 1" problem, but that's not the case. I have 10k to 50k items with related items and I try to fetch and serialize them for REST API. Request takes from 20 to 60 seconds and currently I have no idea what is causing the slowness. Note: I asked the same question in Stack Overflow, but haven't found the answer from there - https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl * N+1 Problem * There are not too many queries and queries themselves are fast (queries at the end of this post). I'm using `prefetch_related` to limit number of queries and what I'm seeing from DB queries everything is looking Okay..ish (?). I get one query for each `prefetch_related` property + the original query for serialized objects. There are lots and lots of IDs included in `prefetch_related` queries, but I guess that is inevitable - as many IDs as there are original items. To test SQL queries + data transfer, I run same queries with psql from one of my EC2 instances to RDS DB with same DB data and wrote the data to file. Data transfer plus file write on top of that and it is totally between 100 to 500 ms for bigger SQL queries for different data sets. File write is extra, but I wanted to ensure that I get all the data I expect. I tested timing from EC2 instance with command from like: time psql -f fat-query.psql --host=rds_host --port=5432 --username=user --dbname=dbname > output.txt * Profiling * When profiling as shown here https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ I get results that DB lookups take most of the time while serializing is not too fast either. As an example I have about 12k "Items" in the my local PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and majority of "Items" have also 1-2 "Photos". Fetching and serializing that data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for deployment and the timing is about the same there. On larger "Item" sets serialization time is increasing more than DB lookup time, but DB lookups always take most of the time. With 40k items you'll start to reach 1 min execution time and different timeouts from Nginx and other parts of the stack. Example with 12k items (models, serializers and queries below) Database lookup | 14.0292s Serialization | 6.0076s Django request/response | 0.3488s API view | 0.2170s Response rendering| 1.1092s If I leave Photos and Events out the result is Database lookup | 1.2447s Serialization | 3.9668s Django request/response | 0.2435s API view | 0.1320s Response rendering| 0.8495s * What might cause the slowness? * So, the related fields are taking most of the time (many=True). The profiling I used for testing is making `list` out of queryset before serializing. Therefore lazy queries are executed before serialization. If I don't do that, it doesn't change the overall results, but DB lookups are evaluated when serializing with about the same amount of time. Now the problem for me is that all queries that are done are fast if executed manually. So, I believe SQL queries are fast, but DB lookups from Django's point of view are very slow. What am I missing here? Or how should I continue investigations? It feels like now it requires serious effort from Django to convert SQL query results to Django model instances. That would imply that there's something wrong with my models, right? At the end, I could turn to caching, but I would assume that handling < 100k objects should not be an issue for Django if done correctly. -- Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3 Simplified versions of models, views and serializers: class List(models.Model): ... CharFields, DateTimeFields, ForeignKeys etc. ... class Item(models.Model): list = models.ForeignKey(List, on_delete=models.CASCADE, db_index=True, null=True, related_name='items') deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None) created_by = models.ForeignKey(User, blank=False) project = models.ForeignKey('projects.Project', on_delete=models.CASCADE) ... other CharFields, DateTimeFields, ForeignKeys etc. ... class Event(models.Model): item = models.ForeignKey(Item, on_delete=models.CASCADE, db_index=True, null=True, related_name='events') created_by = models.ForeignKey(User, blank=False) deleted_at = models.D
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
There is some overhead by using ORM: this is inevitable. On the other hand, you didn't posted your ORM calls, and the queries are incomplete. If you think your queries are optimized (as I understand they are from your description), an workaround is to avoid using the ORM for those specific queries. Or review your Models, they might not be optimized for what you want. Can you try to reduce the number of queries to a smaller number by creating more complex queries? Another possibility is getting a new Model recovering information from a view/stored procedure on the database... you see, you want to get 10k to 50k items... that's a lot (to serialize too). Em 11/06/2017 07:06, Miika Huusko escreveu: I have a performance problem with Django + Django REST Framework. The problem is that it takes Django very, very long time to handle SQL query response + DRF serialization take quite some time also. It happens when there are ManyToMany or OneToMany relations and nested objects. Sounds like "N + 1" problem, but that's not the case. I have 10k to 50k items with related items and I try to fetch and serialize them for REST API. Request takes from 20 to 60 seconds and currently I have no idea what is causing the slowness. Note: I asked the same question in Stack Overflow, but haven't found the answer from there - https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl * N+1 Problem * There are not too many queries and queries themselves are fast (queries at the end of this post). I'm using `prefetch_related` to limit number of queries and what I'm seeing from DB queries everything is looking Okay..ish (?). I get one query for each `prefetch_related` property + the original query for serialized objects. There are lots and lots of IDs included in `prefetch_related` queries, but I guess that is inevitable - as many IDs as there are original items. To test SQL queries + data transfer, I run same queries with psql from one of my EC2 instances to RDS DB with same DB data and wrote the data to file. Data transfer plus file write on top of that and it is totally between 100 to 500 ms for bigger SQL queries for different data sets. File write is extra, but I wanted to ensure that I get all the data I expect. I tested timing from EC2 instance with command from like: time psql -f fat-query.psql --host=rds_host --port=5432 --username=user --dbname=dbname > output.txt * Profiling * When profiling as shown here https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ I get results that DB lookups take most of the time while serializing is not too fast either. As an example I have about 12k "Items" in the my local PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and majority of "Items" have also 1-2 "Photos". Fetching and serializing that data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for deployment and the timing is about the same there. On larger "Item" sets serialization time is increasing more than DB lookup time, but DB lookups always take most of the time. With 40k items you'll start to reach 1 min execution time and different timeouts from Nginx and other parts of the stack. Example with 12k items (models, serializers and queries below) Database lookup | 14.0292s Serialization | 6.0076s Django request/response | 0.3488s API view | 0.2170s Response rendering| 1.1092s If I leave Photos and Events out the result is Database lookup | 1.2447s Serialization | 3.9668s Django request/response | 0.2435s API view | 0.1320s Response rendering| 0.8495s * What might cause the slowness? * So, the related fields are taking most of the time (many=True). The profiling I used for testing is making `list` out of queryset before serializing. Therefore lazy queries are executed before serialization. If I don't do that, it doesn't change the overall results, but DB lookups are evaluated when serializing with about the same amount of time. Now the problem for me is that all queries that are done are fast if executed manually. So, I believe SQL queries are fast, but DB lookups from Django's point of view are very slow. What am I missing here? Or how should I continue investigations? It feels like now it requires serious effort from Django to convert SQL query results to Django model instances. That would imply that there's something wrong with my models, right? At the end, I could turn to caching, but I would assume that handling < 100k objects should not be an issue for Django if done correctly. -- Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3 Simplified versions of models, views and serializers: class List(models.Model): ... CharFields,
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
Besides, you "time" from a shell is not the appropriate way to measure DB response. Try to use whatever is available for your database to measure that. Also, see http://blogs.perl.org/users/steffen_mueller/2010/09/your-benchmarks-suck.html. Don't be fooled by the article being on a Perl related blog, the concepts on the post are applicable everywhere. Em 11/06/2017 07:06, Miika Huusko escreveu: I have a performance problem with Django + Django REST Framework. The problem is that it takes Django very, very long time to handle SQL query response + DRF serialization take quite some time also. It happens when there are ManyToMany or OneToMany relations and nested objects. Sounds like "N + 1" problem, but that's not the case. I have 10k to 50k items with related items and I try to fetch and serialize them for REST API. Request takes from 20 to 60 seconds and currently I have no idea what is causing the slowness. Note: I asked the same question in Stack Overflow, but haven't found the answer from there - https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl * N+1 Problem * There are not too many queries and queries themselves are fast (queries at the end of this post). I'm using `prefetch_related` to limit number of queries and what I'm seeing from DB queries everything is looking Okay..ish (?). I get one query for each `prefetch_related` property + the original query for serialized objects. There are lots and lots of IDs included in `prefetch_related` queries, but I guess that is inevitable - as many IDs as there are original items. To test SQL queries + data transfer, I run same queries with psql from one of my EC2 instances to RDS DB with same DB data and wrote the data to file. Data transfer plus file write on top of that and it is totally between 100 to 500 ms for bigger SQL queries for different data sets. File write is extra, but I wanted to ensure that I get all the data I expect. I tested timing from EC2 instance with command from like: time psql -f fat-query.psql --host=rds_host --port=5432 --username=user --dbname=dbname > output.txt * Profiling * When profiling as shown here https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ I get results that DB lookups take most of the time while serializing is not too fast either. As an example I have about 12k "Items" in the my local PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and majority of "Items" have also 1-2 "Photos". Fetching and serializing that data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for deployment and the timing is about the same there. On larger "Item" sets serialization time is increasing more than DB lookup time, but DB lookups always take most of the time. With 40k items you'll start to reach 1 min execution time and different timeouts from Nginx and other parts of the stack. Example with 12k items (models, serializers and queries below) Database lookup | 14.0292s Serialization | 6.0076s Django request/response | 0.3488s API view | 0.2170s Response rendering| 1.1092s If I leave Photos and Events out the result is Database lookup | 1.2447s Serialization | 3.9668s Django request/response | 0.2435s API view | 0.1320s Response rendering| 0.8495s * What might cause the slowness? * So, the related fields are taking most of the time (many=True). The profiling I used for testing is making `list` out of queryset before serializing. Therefore lazy queries are executed before serialization. If I don't do that, it doesn't change the overall results, but DB lookups are evaluated when serializing with about the same amount of time. Now the problem for me is that all queries that are done are fast if executed manually. So, I believe SQL queries are fast, but DB lookups from Django's point of view are very slow. What am I missing here? Or how should I continue investigations? It feels like now it requires serious effort from Django to convert SQL query results to Django model instances. That would imply that there's something wrong with my models, right? At the end, I could turn to caching, but I would assume that handling < 100k objects should not be an issue for Django if done correctly. -- Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3 Simplified versions of models, views and serializers: class List(models.Model): ... CharFields, DateTimeFields, ForeignKeys etc. ... class Item(models.Model): list = models.ForeignKey(List, on_delete=models.CASCADE, db_index=True, null=True, related_name='items') deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None) created_by = models.Foreig
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
Thanks for response! Yep, timing with "time" is not the best way to go to compare SQL query times. The reason I added that "time" test is that in Stack Overflow I was asked to confirm that data transfer is not factor here. I timed SQL queries with "EXPLAIN ANALYZE" and "\timing" first. That don't take data transfer into account, so, "time" as quick test that data transfer is not a problem. About timing: as an example I can reduce the problem to serializing only Items and related Photos. It results only two queries. For example for a dataset of about 40k items results: django.db.backends: (1.252) SELECT "todo_item"."version", *... all item properties ...* FROM "todo_item" WHERE ("todo_item"."project_id" = '...' AND "todo_item"."deleted_at" IS NULL); django.db.backends: (0.883) SELECT "photos_photo"."version", *... all item properties ...* FROM "photos_photo" WHERE "photos_photo"."note_id" IN (349527, 349528, 349529, *... and rest of the 40k IDs ...* ); Quite simple quries. Timing shown in django.db.backends logs is what those queries take when executed manually with psql (1252 ms + 883 ms). That results simple profiling info: Database lookup | 20.4447s Serialization | 3.3821s Django request/response | 0.3419s API view | 0.1988s Response rendering| 0.4591s That's only from a single query and query times vary of course. Still, the difference between how long it takes query data from DB and how long Django process it is just that huge. The part I don't understand is that it takes about 20 seconds to run list(self.get_queryset()) while those two queries take about 2 seconds in SQL. There is some serious effort and time put there by Django. Those two queries are only queries that are run during list(self.get_queryset()) according to django.db.backend logs. "list" is there to force query execution to separate DB lookup time and serialization time. Adding a new Model recovering information from a view/stored procedure on the database is a good idea. Of course, I would like to first understand what might be wrong in current models to not make same mistakes again. Is there something one should consider when making related items like Photos and Events related to Items? That use case is quite simple and still result Django to use 18 seconds to process SQL query response. There is lot of data of course, but I have thought that returning 50k objects should not be a problem for Python / Django even though ORM always adds some overhead. On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote: > > I have a performance problem with Django + Django REST Framework. The > problem is that it takes Django very, very long time to handle SQL query > response + DRF serialization take quite some time also. It happens when > there are ManyToMany or OneToMany relations and nested objects. Sounds like > "N + 1" problem, but that's not the case. I have 10k to 50k items with > related items and I try to fetch and serialize them for REST API. Request > takes from 20 to 60 seconds and currently I have no idea what is causing > the slowness. > > Note: I asked the same question in Stack Overflow, but haven't found the > answer from there - > https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl > > > * N+1 Problem * > > There are not too many queries and queries themselves are fast (queries at > the end of this post). I'm using `prefetch_related` to limit number of > queries and what I'm seeing from DB queries everything is looking Okay..ish > (?). I get one query for each `prefetch_related` property + the original > query for serialized objects. There are lots and lots of IDs included in > `prefetch_related` queries, but I guess that is inevitable - as many IDs as > there are original items. > > To test SQL queries + data transfer, I run same queries with psql from one > of my EC2 instances to RDS DB with same DB data and wrote the data to file. > Data transfer plus file write on top of that and it is totally between 100 > to 500 ms for bigger SQL queries for different data sets. File write is > extra, but I wanted to ensure that I get all the data I expect. > > I tested timing from EC2 instance with command from like: > > time psql -f fat-query.psql --host=rds_host --port=5432 > --username=user --dbname=dbname > output.txt > > > * Profiling * > > When profiling as shown here > https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ > I get results that DB lookups take most of the time while serializing is > not too fast either. As an example I have about 12k "Items" in the my local > PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and > majority of "Items" have also 1-2 "Photos". Fetching and serializing that > data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
This may be very basic thing to ask, but there is no harm to double check: are you using select_related instead of all for those two specific Models? Check out https://django-debug-toolbar.readthedocs.io/en/stable/index.html if you aren't using it yet, it will help you to get that very easily (I'm assuming you have a view to list all those instances for those Models). Em 11/06/2017 13:15, Miika Huusko escreveu: Thanks for response! Yep, timing with "time" is not the best way to go to compare SQL query times. The reason I added that "time" test is that in Stack Overflow I was asked to confirm that data transfer is not factor here. I timed SQL queries with "EXPLAIN ANALYZE" and "\timing" first. That don't take data transfer into account, so, "time" as quick test that data transfer is not a problem. About timing: as an example I can reduce the problem to serializing only Items and related Photos. It results only two queries. For example for a dataset of about 40k items results: django.db.backends: (1.252) SELECT "todo_item"."version", *... all item properties ...* FROM "todo_item" WHERE ("todo_item"."project_id" = '...' AND "todo_item"."deleted_at" IS NULL); django.db.backends: (0.883) SELECT "photos_photo"."version", *... all item properties ...* FROM "photos_photo" WHERE "photos_photo"."note_id" IN (349527, 349528, 349529, *... and rest of the 40k IDs ...* ); Quite simple quries. Timing shown in django.db.backends logs is what those queries take when executed manually with psql (1252 ms + 883 ms). That results simple profiling info: Database lookup | 20.4447s Serialization | 3.3821s Django request/response | 0.3419s API view | 0.1988s Response rendering| 0.4591s That's only from a single query and query times vary of course. Still, the difference between how long it takes query data from DB and how long Django process it is just that huge. The part I don't understand is that it takes about 20 seconds to run list(self.get_queryset()) while those two queries take about 2 seconds in SQL. There is some serious effort and time put there by Django. Those two queries are only queries that are run during list(self.get_queryset()) according to django.db.backend logs. "list" is there to force query execution to separate DB lookup time and serialization time. Adding a new Model recovering information from a view/stored procedure on the database is a good idea. Of course, I would like to first understand what might be wrong in current models to not make same mistakes again. Is there something one should consider when making related items like Photos and Events related to Items? That use case is quite simple and still result Django to use 18 seconds to process SQL query response. There is lot of data of course, but I have thought that returning 50k objects should not be a problem for Python / Django even though ORM always adds some overhead. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/b25172ca-da2b-e645-cf70-f06b11329bba%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
On Sunday 11 June 2017 09:15:09 Miika Huusko wrote: > Thanks for response! > > Yep, timing with "time" is not the best way to go to compare SQL query > times. The reason I added that "time" test is that in Stack Overflow > I was asked to confirm that data transfer is not factor here. I timed > SQL queries with "EXPLAIN ANALYZE" and "\timing" first. That don't > take data transfer into account, so, "time" as quick test that data > transfer is not a problem. > > About timing: as an example I can reduce the problem to serializing > only Items and related Photos. It results only two queries. For > example for a dataset of about 40k items results: > > django.db.backends: (1.252) SELECT "todo_item"."version", *... all > item properties ...* FROM "todo_item" WHERE ("todo_item"."project_id" > = '...' AND "todo_item"."deleted_at" IS NULL); > > django.db.backends: (0.883) SELECT "photos_photo"."version", *... all > item properties ...* FROM "photos_photo" WHERE > "photos_photo"."note_id" IN (349527, 349528, 349529, *... and rest of > the 40k IDs ...* ); Why did you opt to use prefetch_related and not select_related? -- Melvyn Sopacua -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/24568739.Jy3x7NNMDG%40devstation. For more options, visit https://groups.google.com/d/optout.
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
@Alceu Rodrigues de Freitas Junior, there are no too basic questions to ask :) I'm sure I'm missing something, so, please keep asking. But yes, I have considered that. @Melvyn Sopacue, I opt to use prefetch_related because I'm fetching "Items" that are used as ForeignKey for "Photos" and "Events". Select_related is meant for cases where you have OneToOne of ForeignKey relation (single item) that you need to fetch. If I would be fetching "Photos" I would use selected_related to JOIN "Items" in SQL level. Now that I have fetching "Items" relations to "Photos" and "Events" are "reverse ForeignKey" Many-to-One sets and I need to use prefetch_related. Of course it means that JOIN is done in Python (that is probably what takes the time now), but I cannot see how I could do it with selected_related. Please let me know, if there's a way to utilize select_related with related sets. I'm kind of hoping to find a problem in my implementation or thinking that would somehow make the Python level JOIN behave badly. It's a big JOIN, but still, Python is fast. There must be some kind of extra looping or cloning going on when performing Python level JOIN for prefetched Photos or other related sets. On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote: > > I have a performance problem with Django + Django REST Framework. The > problem is that it takes Django very, very long time to handle SQL query > response + DRF serialization take quite some time also. It happens when > there are ManyToMany or OneToMany relations and nested objects. Sounds like > "N + 1" problem, but that's not the case. I have 10k to 50k items with > related items and I try to fetch and serialize them for REST API. Request > takes from 20 to 60 seconds and currently I have no idea what is causing > the slowness. > > Note: I asked the same question in Stack Overflow, but haven't found the > answer from there - > https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl > > > * N+1 Problem * > > There are not too many queries and queries themselves are fast (queries at > the end of this post). I'm using `prefetch_related` to limit number of > queries and what I'm seeing from DB queries everything is looking Okay..ish > (?). I get one query for each `prefetch_related` property + the original > query for serialized objects. There are lots and lots of IDs included in > `prefetch_related` queries, but I guess that is inevitable - as many IDs as > there are original items. > > To test SQL queries + data transfer, I run same queries with psql from one > of my EC2 instances to RDS DB with same DB data and wrote the data to file. > Data transfer plus file write on top of that and it is totally between 100 > to 500 ms for bigger SQL queries for different data sets. File write is > extra, but I wanted to ensure that I get all the data I expect. > > I tested timing from EC2 instance with command from like: > > time psql -f fat-query.psql --host=rds_host --port=5432 > --username=user --dbname=dbname > output.txt > > > * Profiling * > > When profiling as shown here > https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ > I get results that DB lookups take most of the time while serializing is > not too fast either. As an example I have about 12k "Items" in the my local > PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and > majority of "Items" have also 1-2 "Photos". Fetching and serializing that > data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for > deployment and the timing is about the same there. On larger "Item" sets > serialization time is increasing more than DB lookup time, but DB lookups > always take most of the time. With 40k items you'll start to reach 1 min > execution time and different timeouts from Nginx and other parts of the > stack. > > Example with 12k items (models, serializers and queries below) > > Database lookup | 14.0292s > Serialization | 6.0076s > Django request/response | 0.3488s > API view | 0.2170s > Response rendering| 1.1092s > > If I leave Photos and Events out the result is > > Database lookup | 1.2447s > Serialization | 3.9668s > Django request/response | 0.2435s > API view | 0.1320s > Response rendering| 0.8495s > > > * What might cause the slowness? * > > So, the related fields are taking most of the time (many=True). The > profiling I used for testing is making `list` out of queryset before > serializing. Therefore lazy queries are executed before serialization. If I > don't do that, it doesn't change the overall results, but DB lookups are > evaluated when serializing with about the same amount of time. > > Now the problem for me is that all queries that a
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
On Sunday 11 June 2017 13:41:55 Miika Huusko wrote: > @Melvyn Sopacue, I opt to use prefetch_related because I'm fetching > "Items" that are used as ForeignKey for "Photos" and "Events". > Select_related is meant for cases where you have OneToOne of > ForeignKey relation (single item) that you need to fetch. If I would > be fetching "Photos" I would use selected_related to JOIN "Items" in > SQL level. Now that I have fetching "Items" relations to "Photos" and > "Events" are "reverse ForeignKey" Many-to-One sets and I need to use > prefetch_related. You are in part. But created_by is a foreign key on items, not the other way around. So this: > > django.db.backends: (0.001) SELECT "auth_user"."id", ... > > everything > > > > ... FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ... some IDs > > ...);> can be avoided. Last but not least, this is why API's use pagination. If for whatever reason you must provide such a complex model (instead of having the consumer make more API calls for photo's they don't have yet) you paginate, but it's not very flexible and scales bad. The proper desgin pattern here would be to hash your photo's, use that as identifier in the API and put the burdon on the consumer to only request photo's they don't have the ID for yet. This is for example, how Zendesk email attachments. -- Melvyn Sopacua -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3506371.52MvSNx3by%40devstation. For more options, visit https://groups.google.com/d/optout.
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
@Melvyn Sopacua, yes, that is very true that auth_user SELECT can be avoided. Thanks for pointing that out! I don't know what I had put it like that. Unfortunately, it's the related sets and big JOINs that are the problem and generating the "slowness". The reason I'm providing all "Items" and other relates stuff with one request is that I want to make customer (that's also me and my good frieds) life easier. I want to keep all the difficult and heavy work on server side. Customer needs all data because of offline requirements and therefore would need to loop through pagination or perform the same big JOIN if "Photos" and other related objects would be requested separately. Doing that in mobile devices is going to take longer than on Ubuntu server even if it takes longer that I would expect with Python + Django. That said, it's good to point out opinions on design patterns. I haven't listed all background information for decisions. I'm not quite sure what you mean by "hash photos and use that as identifier". What I could do - and believe what you mean(?) - is to provide list of "Photo" IDs (or hashes) for "Item" and ask customer to fetch "Photo" details separately with another request(s). Or - easier for API - leave "Photo" IDs out of the "Item" information completely and ask customer to make the JOIN with separately requested "Photos". JOIN could be done with hashes or IDs. Either way it would be customer who would make the JOIN. I might go there, but still I don't believe that Django could not handle my meager data faster than 20 seconds :) If that would be the case it really would be sad. On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote: > > I have a performance problem with Django + Django REST Framework. The > problem is that it takes Django very, very long time to handle SQL query > response + DRF serialization take quite some time also. It happens when > there are ManyToMany or OneToMany relations and nested objects. Sounds like > "N + 1" problem, but that's not the case. I have 10k to 50k items with > related items and I try to fetch and serialize them for REST API. Request > takes from 20 to 60 seconds and currently I have no idea what is causing > the slowness. > > Note: I asked the same question in Stack Overflow, but haven't found the > answer from there - > https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl > > > * N+1 Problem * > > There are not too many queries and queries themselves are fast (queries at > the end of this post). I'm using `prefetch_related` to limit number of > queries and what I'm seeing from DB queries everything is looking Okay..ish > (?). I get one query for each `prefetch_related` property + the original > query for serialized objects. There are lots and lots of IDs included in > `prefetch_related` queries, but I guess that is inevitable - as many IDs as > there are original items. > > To test SQL queries + data transfer, I run same queries with psql from one > of my EC2 instances to RDS DB with same DB data and wrote the data to file. > Data transfer plus file write on top of that and it is totally between 100 > to 500 ms for bigger SQL queries for different data sets. File write is > extra, but I wanted to ensure that I get all the data I expect. > > I tested timing from EC2 instance with command from like: > > time psql -f fat-query.psql --host=rds_host --port=5432 > --username=user --dbname=dbname > output.txt > > > * Profiling * > > When profiling as shown here > https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ > I get results that DB lookups take most of the time while serializing is > not too fast either. As an example I have about 12k "Items" in the my local > PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and > majority of "Items" have also 1-2 "Photos". Fetching and serializing that > data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for > deployment and the timing is about the same there. On larger "Item" sets > serialization time is increasing more than DB lookup time, but DB lookups > always take most of the time. With 40k items you'll start to reach 1 min > execution time and different timeouts from Nginx and other parts of the > stack. > > Example with 12k items (models, serializers and queries below) > > Database lookup | 14.0292s > Serialization | 6.0076s > Django request/response | 0.3488s > API view | 0.2170s > Response rendering| 1.1092s > > If I leave Photos and Events out the result is > > Database lookup | 1.2447s > Serialization | 3.9668s > Django request/response | 0.2435s > API view | 0.1320s > Response rendering| 0.8495s > > > * What might cause the slown
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
I'm guessing here that REST would not be the appropriate to handle "large" amounts of data, and probably that's why Melevyn suggested to use pagination. If your customer needs such data to be used off-line, I bet that would be better to generate a file in the appropriate format, compact it and redirect it to request the file to be downloaded. For that, you could skip the ORM entirely, or build an appropriate Model for it. I would stick with the former (and export the data straight from the DB with a stored procedure), I don't see you reusing this new Model in other places in your Django app. Em 11/06/2017 18:41, Miika Huusko escreveu: @Melvyn Sopacua, yes, that is very true that auth_user SELECT can be avoided. Thanks for pointing that out! I don't know what I had put it like that. Unfortunately, it's the related sets and big JOINs that are the problem and generating the "slowness". The reason I'm providing all "Items" and other relates stuff with one request is that I want to make customer (that's also me and my good frieds) life easier. I want to keep all the difficult and heavy work on server side. Customer needs all data because of offline requirements and therefore would need to loop through pagination or perform the same big JOIN if "Photos" and other related objects would be requested separately. Doing that in mobile devices is going to take longer than on Ubuntu server even if it takes longer that I would expect with Python + Django. That said, it's good to point out opinions on design patterns. I haven't listed all background information for decisions. I'm not quite sure what you mean by "hash photos and use that as identifier". What I could do - and believe what you mean(?) - is to provide list of "Photo" IDs (or hashes) for "Item" and ask customer to fetch "Photo" details separately with another request(s). Or - easier for API - leave "Photo" IDs out of the "Item" information completely and ask customer to make the JOIN with separately requested "Photos". JOIN could be done with hashes or IDs. Either way it would be customer who would make the JOIN. I might go there, but still I don't believe that Django could not handle my meager data faster than 20 seconds :) If that would be the case it really would be sad. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/db9cf163-6ddb-db87-e246-1151b180f9cd%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem
On Sunday 11 June 2017 14:41:22 Miika Huusko wrote: > The reason I'm providing all "Items" and other relates stuff with one > request is that I want to make customer (that's also me and my good > frieds) life easier. I want to keep all the difficult and heavy work > on server side. Customer needs all data because of offline > requirements and therefore would need to loop through pagination or > perform the same big JOIN if "Photos" and other related objects would > be requested separately. Doing that in mobile devices is going to > take longer than on Ubuntu server even if it takes longer that I > would expect with Python + Django. That said, it's good to point out > opinions on design patterns. I haven't listed all background > information for decisions. Think about how Google Maps does this with Offline dataset for area's you've marked. They cache it for x days. Then prompt you if you have a free moment to sync up till you do. It's not a burdon in practice and makes life a lot easier. Sometimes we set requirements too harsh for ourselves. > I'm not quite sure what you mean by "hash photos and use that as > identifier". Sorry, I wasn't complete as it's been a while. I've built an application that fetches Zendesk data into Django to for trend analysis. To get an attachment, for a given message was: - get the message id: /messages/id - get the attachment id's: messages/id/attachments (this is your reverse related) - for each one that you don't have yet, fetch it: /files/id The hashing came into play, because corporate idiots - I mean people - attach their logo to each and every email they send. So by hashing them, you eliminate duplicates and scale down your storage requirements and (query) data size. > What I could do - and believe what you mean(?) - is to > provide list of "Photo" IDs (or hashes) for "Item" and ask customer > to fetch "Photo" details separately with another request(s). Or - > easier for API - leave "Photo" IDs out of the "Item" information > completely and ask customer to make the JOIN with separately > requested "Photos". JOIN could be done with hashes or IDs. Either way > it would be customer who would make the JOIN. But it makes a lot more sense now with the mobile and offline requirements, why you have setup the API the way you did. Two more thourghts for you: - Use a cacheable dataset and incremental updates to be done "when wifi or better is available". Complex, but worth it in the long run. - Ditch DRF serialization and get the querieset as values(), dumping to json instantly and thus eliminate both serialization objects and model objects. It's easy to test the performance gain of the 2nd, especially if you can stream the json (as opposed to in-memory-then-flush). -- Melvyn Sopacua -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1751373.LEcVXiJZ3g%40devstation. For more options, visit https://groups.google.com/d/optout.
Is this a bug on annotate+filter+annotate over M2M field.
I believe issue #28297 is related to the problem described above. https://code.djangoproject.com/ticket/28297 -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/8b042519-82db-4916-9e9a-193e5f51f68d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.