Django Real time Notifications

2017-06-11 Thread yingi keme
I want to know if django supports any real time notifications/communications in 
web applications.

For instance, if User A makes an update to its model instance, User B should 
get an instant notification in real time.

Does django provide a mechanism for that?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/f67beae9-a58b-40ec-8ecf-9181f617b089%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Django Real time Notifications

2017-06-11 Thread Andréas Kühne
Hi,

You should check out django-channels (
https://channels.readthedocs.io/en/stable/) for that. That is a websocket
based solution for doing realtime notifications (as long as the users are
logged in to the website).

Regards,

Andréas

2017-06-11 14:00 GMT+02:00 yingi keme :

> I want to know if django supports any real time
> notifications/communications in web applications.
>
> For instance, if User A makes an update to its model instance, User B
> should get an instant notification in real time.
>
> Does django provide a mechanism for that?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-users/f67beae9-a58b-40ec-8ecf-9181f617b089%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAK4qSCft1DweuZ5DMJqwHKFNUz0Lm_%2BjUaxuX2k6Mm43shVmxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Django Real time Notifications

2017-06-11 Thread Melvyn Sopacua
On Sunday 11 June 2017 05:00:19 yingi keme wrote:
> I want to know if django supports any real time
> notifications/communications in web applications.
> 
> For instance, if User A makes an update to its model instance, User B
> should get an instant notification in real time.
> 
> Does django provide a mechanism for that?

Yes and no.
Django allows you to define signals, one of them being a post_save signal on a 
Model.
It has no mechanism to notify "User B", as it does not know how to notify User 
B.

But, it has views and urls and browsers have Ajax. So you can poll (not 
realtime).

Django Channels (which is a 3rd party package created by one of Django's core 
developers) 
supports websockets. Using websockets one can notify in (near) real time.

So in order to do what you want, you need to glue signals to channels, or teach 
Django 
Channels how to detect a change in your model.
-- 
Melvyn Sopacua

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/1912475.GSAzsYGyPs%40devstation.
For more options, visit https://groups.google.com/d/optout.


Re: Django Real time Notifications

2017-06-11 Thread yingi keme
Thanks Andreas. I will check it out

Yingi Kem

> On 11 Jun 2017, at 1:08 PM, Andréas Kühne  wrote:
> 
> Hi,
> 
> You should check out django-channels 
> (https://channels.readthedocs.io/en/stable/) for that. That is a websocket 
> based solution for doing realtime notifications (as long as the users are 
> logged in to the website).
> 
> Regards,
> 
> Andréas
> 
> 2017-06-11 14:00 GMT+02:00 yingi keme :
>> I want to know if django supports any real time notifications/communications 
>> in web applications.
>> 
>> For instance, if User A makes an update to its model instance, User B should 
>> get an instant notification in real time.
>> 
>> Does django provide a mechanism for that?
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users+unsubscr...@googlegroups.com.
>> To post to this group, send email to django-users@googlegroups.com.
>> Visit this group at https://groups.google.com/group/django-users.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/f67beae9-a58b-40ec-8ecf-9181f617b089%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/CAK4qSCft1DweuZ5DMJqwHKFNUz0Lm_%2BjUaxuX2k6Mm43shVmxA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/B58A8F7B-2BB5-45D7-948C-31441E37D195%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Django Real time Notifications

2017-06-11 Thread yingi keme
Thanks melvyn. I think you nailed it for me.

Thanks again

Yingi Kem

> On 11 Jun 2017, at 1:18 PM, Melvyn Sopacua  wrote:
> 
> On Sunday 11 June 2017 05:00:19 yingi keme wrote:
> > I want to know if django supports any real time
> > notifications/communications in web applications.
> >
> > For instance, if User A makes an update to its model instance, User B
> > should get an instant notification in real time.
> >
> > Does django provide a mechanism for that?
>  
> Yes and no.
> Django allows you to define signals, one of them being a post_save signal on 
> a Model.
> It has no mechanism to notify "User B", as it does not know how to notify 
> User B.
>  
> But, it has views and urls and browsers have Ajax. So you can poll (not 
> realtime).
>  
> Django Channels (which is a 3rd party package created by one of Django's core 
> developers) supports websockets. Using websockets one can notify in (near) 
> real time.
>  
> So in order to do what you want, you need to glue signals to channels, or 
> teach Django Channels how to detect a change in your model.
> --
> Melvyn Sopacua
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/1912475.GSAzsYGyPs%40devstation.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/95737D4D-E561-422E-8B41-64A3724A1863%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Miika Huusko
I have a performance problem with Django + Django REST Framework. The 
problem is that it takes Django very, very long time to handle SQL query 
response + DRF serialization take quite some time also. It happens when 
there are ManyToMany or OneToMany relations and nested objects. Sounds like 
"N + 1" problem, but that's not the case. I have 10k to 50k items with 
related items and I try to fetch and serialize them for REST API. Request 
takes from 20 to 60 seconds and currently I have no idea what is causing 
the slowness.

Note: I asked the same question in Stack Overflow, but haven't found the 
answer from there - 
https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl


* N+1 Problem * 

There are not too many queries and queries themselves are fast (queries at 
the end of this post). I'm using `prefetch_related` to limit number of 
queries and what I'm seeing from DB queries everything is looking Okay..ish 
(?). I get one query for each `prefetch_related` property + the original 
query for serialized objects. There are lots and lots of IDs included in 
`prefetch_related` queries, but I guess that is inevitable - as many IDs as 
there are original items. 

To test SQL queries + data transfer, I run same queries with psql from one 
of my EC2 instances to RDS DB with same DB data and wrote the data to file. 
Data transfer plus file write on top of that and it is totally between 100 
to 500 ms for bigger SQL queries for different data sets. File write is 
extra, but I wanted to ensure that I get all the data I expect.

I tested timing from EC2 instance with command from like:

time psql -f fat-query.psql --host=rds_host --port=5432 --username=user 
--dbname=dbname > output.txt


* Profiling * 

When profiling as shown here 
https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
I get results that DB lookups take most of the time while serializing is 
not too fast either. As an example I have about 12k "Items" in the my local 
PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and 
majority of "Items" have also 1-2 "Photos". Fetching and serializing that 
data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for 
deployment and the timing is about the same there. On larger "Item" sets 
serialization time is increasing more than DB lookup time, but DB lookups 
always take most of the time. With 40k items you'll start to reach 1 min 
execution time and different timeouts from Nginx and other parts of the 
stack.

Example with 12k items (models, serializers and queries below)

Database lookup   | 14.0292s
Serialization | 6.0076s
Django request/response   | 0.3488s
API view  | 0.2170s
Response rendering| 1.1092s

If I leave Photos and Events out the result is

Database lookup   | 1.2447s
Serialization | 3.9668s
Django request/response   | 0.2435s
API view  | 0.1320s
Response rendering| 0.8495s


* What might cause the slowness? *

So, the related fields are taking most of the time (many=True). The 
profiling I used for testing is making `list` out of queryset before 
serializing. Therefore lazy queries are executed before serialization. If I 
don't do that, it doesn't change the overall results, but DB lookups are 
evaluated when serializing with about the same amount of time.

Now the problem for me is that all queries that are done are fast if 
executed manually. So, I believe SQL queries are fast, but DB lookups from 
Django's point of view are very slow. What am I missing here? Or how should 
I continue investigations? It feels like now it requires serious effort 
from Django to convert SQL query results to Django model instances. That 
would imply that there's something wrong with my models, right?


At the end, I could turn to caching, but I would assume that handling < 
100k objects should not be an issue for Django if done correctly.

--

Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3

Simplified versions of models, views and serializers:

class List(models.Model):
... CharFields, DateTimeFields, ForeignKeys etc. ...

class Item(models.Model):
list = models.ForeignKey(List, on_delete=models.CASCADE, 
db_index=True, null=True, related_name='items')
deleted_at = models.DateTimeField(db_index=True, blank=True, 
null=True, default=None)
created_by = models.ForeignKey(User, blank=False)
project = models.ForeignKey('projects.Project', 
on_delete=models.CASCADE)
... other CharFields, DateTimeFields, ForeignKeys etc. ...

class Event(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE, 
db_index=True, null=True, related_name='events')
created_by = models.ForeignKey(User, blank=False)
deleted_at = models.D

Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Alceu Rodrigues de Freitas Junior

There is some overhead by using ORM: this is inevitable.

On the other hand, you didn't posted your ORM calls, and the queries are 
incomplete.


If you think your queries are optimized (as I understand they are from 
your description), an workaround is to avoid using the ORM for those 
specific queries. Or review your Models, they might not be optimized for 
what you want. Can you try to reduce the number of queries to a smaller 
number by creating more complex queries?


Another possibility is getting a new Model recovering information from a 
view/stored procedure on the database... you see, you want to get 10k to 
50k items... that's a lot (to serialize too).



Em 11/06/2017 07:06, Miika Huusko escreveu:
I have a performance problem with Django + Django REST Framework. The 
problem is that it takes Django very, very long time to handle SQL 
query response + DRF serialization take quite some time also. It 
happens when there are ManyToMany or OneToMany relations and nested 
objects. Sounds like "N + 1" problem, but that's not the case. I have 
10k to 50k items with related items and I try to fetch and serialize 
them for REST API. Request takes from 20 to 60 seconds and currently I 
have no idea what is causing the slowness.


Note: I asked the same question in Stack Overflow, but haven't found 
the answer from there - 
https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl



* N+1 Problem *

There are not too many queries and queries themselves are fast 
(queries at the end of this post). I'm using `prefetch_related` to 
limit number of queries and what I'm seeing from DB queries everything 
is looking Okay..ish (?). I get one query for each `prefetch_related` 
property + the original query for serialized objects. There are lots 
and lots of IDs included in `prefetch_related` queries, but I guess 
that is inevitable - as many IDs as there are original items.


To test SQL queries + data transfer, I run same queries with psql from 
one of my EC2 instances to RDS DB with same DB data and wrote the data 
to file. Data transfer plus file write on top of that and it is 
totally between 100 to 500 ms for bigger SQL queries for different 
data sets. File write is extra, but I wanted to ensure that I get all 
the data I expect.


I tested timing from EC2 instance with command from like:

time psql -f fat-query.psql --host=rds_host --port=5432 
--username=user --dbname=dbname > output.txt



* Profiling *

When profiling as shown here 
https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
I get results that DB lookups take most of the time while serializing 
is not too fast either. As an example I have about 12k "Items" in the 
my local PostgreSQL database for one "Project". All "Items" have 1-5 
"Events" and majority of "Items" have also 1-2 "Photos". Fetching and 
serializing that data takes around 22 seconds on my laptop. I'm using 
AWS EC2 + RDS for deployment and the timing is about the same there. 
On larger "Item" sets serialization time is increasing more than DB 
lookup time, but DB lookups always take most of the time. With 40k 
items you'll start to reach 1 min execution time and different 
timeouts from Nginx and other parts of the stack.


Example with 12k items (models, serializers and queries below)

Database lookup   | 14.0292s
Serialization | 6.0076s
Django request/response   | 0.3488s
API view  | 0.2170s
Response rendering| 1.1092s

If I leave Photos and Events out the result is

Database lookup   | 1.2447s
Serialization | 3.9668s
Django request/response   | 0.2435s
API view  | 0.1320s
Response rendering| 0.8495s


* What might cause the slowness? *

So, the related fields are taking most of the time (many=True). The 
profiling I used for testing is making `list` out of queryset before 
serializing. Therefore lazy queries are executed before serialization. 
If I don't do that, it doesn't change the overall results, but DB 
lookups are evaluated when serializing with about the same amount of time.


Now the problem for me is that all queries that are done are fast if 
executed manually. So, I believe SQL queries are fast, but DB lookups 
from Django's point of view are very slow. What am I missing here? Or 
how should I continue investigations? It feels like now it requires 
serious effort from Django to convert SQL query results to Django 
model instances. That would imply that there's something wrong with my 
models, right?



At the end, I could turn to caching, but I would assume that handling 
< 100k objects should not be an issue for Django if done correctly.


--

Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3

Simplified versions of models, views and serializers:

class List(models.Model):
... CharFields,

Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Alceu Rodrigues de Freitas Junior
Besides, you "time" from a shell is not the appropriate way to measure 
DB response. Try to use whatever is available for your database to 
measure that.


Also, see 
http://blogs.perl.org/users/steffen_mueller/2010/09/your-benchmarks-suck.html. 
Don't be fooled by the article being on a Perl related blog, the 
concepts on the post are applicable everywhere.



Em 11/06/2017 07:06, Miika Huusko escreveu:
I have a performance problem with Django + Django REST Framework. The 
problem is that it takes Django very, very long time to handle SQL 
query response + DRF serialization take quite some time also. It 
happens when there are ManyToMany or OneToMany relations and nested 
objects. Sounds like "N + 1" problem, but that's not the case. I have 
10k to 50k items with related items and I try to fetch and serialize 
them for REST API. Request takes from 20 to 60 seconds and currently I 
have no idea what is causing the slowness.


Note: I asked the same question in Stack Overflow, but haven't found 
the answer from there - 
https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl



* N+1 Problem *

There are not too many queries and queries themselves are fast 
(queries at the end of this post). I'm using `prefetch_related` to 
limit number of queries and what I'm seeing from DB queries everything 
is looking Okay..ish (?). I get one query for each `prefetch_related` 
property + the original query for serialized objects. There are lots 
and lots of IDs included in `prefetch_related` queries, but I guess 
that is inevitable - as many IDs as there are original items.


To test SQL queries + data transfer, I run same queries with psql from 
one of my EC2 instances to RDS DB with same DB data and wrote the data 
to file. Data transfer plus file write on top of that and it is 
totally between 100 to 500 ms for bigger SQL queries for different 
data sets. File write is extra, but I wanted to ensure that I get all 
the data I expect.


I tested timing from EC2 instance with command from like:

time psql -f fat-query.psql --host=rds_host --port=5432 
--username=user --dbname=dbname > output.txt



* Profiling *

When profiling as shown here 
https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
I get results that DB lookups take most of the time while serializing 
is not too fast either. As an example I have about 12k "Items" in the 
my local PostgreSQL database for one "Project". All "Items" have 1-5 
"Events" and majority of "Items" have also 1-2 "Photos". Fetching and 
serializing that data takes around 22 seconds on my laptop. I'm using 
AWS EC2 + RDS for deployment and the timing is about the same there. 
On larger "Item" sets serialization time is increasing more than DB 
lookup time, but DB lookups always take most of the time. With 40k 
items you'll start to reach 1 min execution time and different 
timeouts from Nginx and other parts of the stack.


Example with 12k items (models, serializers and queries below)

Database lookup   | 14.0292s
Serialization | 6.0076s
Django request/response   | 0.3488s
API view  | 0.2170s
Response rendering| 1.1092s

If I leave Photos and Events out the result is

Database lookup   | 1.2447s
Serialization | 3.9668s
Django request/response   | 0.2435s
API view  | 0.1320s
Response rendering| 0.8495s


* What might cause the slowness? *

So, the related fields are taking most of the time (many=True). The 
profiling I used for testing is making `list` out of queryset before 
serializing. Therefore lazy queries are executed before serialization. 
If I don't do that, it doesn't change the overall results, but DB 
lookups are evaluated when serializing with about the same amount of time.


Now the problem for me is that all queries that are done are fast if 
executed manually. So, I believe SQL queries are fast, but DB lookups 
from Django's point of view are very slow. What am I missing here? Or 
how should I continue investigations? It feels like now it requires 
serious effort from Django to convert SQL query results to Django 
model instances. That would imply that there's something wrong with my 
models, right?



At the end, I could turn to caching, but I would assume that handling 
< 100k objects should not be an issue for Django if done correctly.


--

Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3

Simplified versions of models, views and serializers:

class List(models.Model):
... CharFields, DateTimeFields, ForeignKeys etc. ...
class Item(models.Model):
list = models.ForeignKey(List, on_delete=models.CASCADE, 
db_index=True, null=True, related_name='items')
deleted_at = models.DateTimeField(db_index=True, blank=True, 
null=True, default=None)

created_by = models.Foreig

Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Miika Huusko
Thanks for response! 

Yep, timing with "time" is not the best way to go to compare SQL query 
times. The reason I added that "time" test is that in Stack Overflow I was 
asked to confirm that data transfer is not factor here. I timed SQL queries 
with "EXPLAIN ANALYZE" and "\timing" first. That don't take data transfer 
into account, so, "time" as quick test that data transfer is not a problem.

About timing: as an example I can reduce the problem to serializing only 
Items and related Photos. It results only two queries. For example for a 
dataset of about 40k items results: 

django.db.backends: (1.252) SELECT "todo_item"."version", *... all item 
properties ...* FROM "todo_item" WHERE ("todo_item"."project_id" = '...' 
AND "todo_item"."deleted_at" IS NULL);

django.db.backends: (0.883) SELECT "photos_photo"."version", *... all item 
properties ...* FROM "photos_photo" WHERE "photos_photo"."note_id" IN 
(349527, 349528, 349529, *... and rest of the 40k IDs ...* );


Quite simple quries. Timing shown in django.db.backends logs is what those 
queries take when executed manually with psql (1252 ms + 883 ms). That 
results simple profiling info:

Database lookup   | 20.4447s
Serialization | 3.3821s
Django request/response   | 0.3419s
API view  | 0.1988s
Response rendering| 0.4591s

That's only from a single query and query times vary of course. Still, the 
difference between how long it takes query data from DB and how long Django 
process it is just that huge. The part I don't understand is that it takes 
about 20 seconds to run list(self.get_queryset()) while those two queries 
take about 2 seconds in SQL. There is some serious effort and time put 
there by Django. Those two queries are only queries that are run during 
list(self.get_queryset()) according to django.db.backend logs. "list" is 
there to force query execution to separate DB lookup time and serialization 
time.

Adding a new Model recovering information from a view/stored procedure on 
the database is a good idea. Of course, I would like to first understand 
what might be wrong in current models to not make same mistakes again. Is 
there something one should consider when making related items like Photos 
and Events related to Items? That use case is quite simple and still result 
Django to use 18 seconds to process SQL query response. There is lot of 
data of course, but I have thought that returning 50k objects should not be 
a problem for Python / Django even though ORM always adds some overhead.


On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote:
>
> I have a performance problem with Django + Django REST Framework. The 
> problem is that it takes Django very, very long time to handle SQL query 
> response + DRF serialization take quite some time also. It happens when 
> there are ManyToMany or OneToMany relations and nested objects. Sounds like 
> "N + 1" problem, but that's not the case. I have 10k to 50k items with 
> related items and I try to fetch and serialize them for REST API. Request 
> takes from 20 to 60 seconds and currently I have no idea what is causing 
> the slowness.
>
> Note: I asked the same question in Stack Overflow, but haven't found the 
> answer from there - 
> https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl
>
>
> * N+1 Problem * 
>
> There are not too many queries and queries themselves are fast (queries at 
> the end of this post). I'm using `prefetch_related` to limit number of 
> queries and what I'm seeing from DB queries everything is looking Okay..ish 
> (?). I get one query for each `prefetch_related` property + the original 
> query for serialized objects. There are lots and lots of IDs included in 
> `prefetch_related` queries, but I guess that is inevitable - as many IDs as 
> there are original items. 
>
> To test SQL queries + data transfer, I run same queries with psql from one 
> of my EC2 instances to RDS DB with same DB data and wrote the data to file. 
> Data transfer plus file write on top of that and it is totally between 100 
> to 500 ms for bigger SQL queries for different data sets. File write is 
> extra, but I wanted to ensure that I get all the data I expect.
>
> I tested timing from EC2 instance with command from like:
>
> time psql -f fat-query.psql --host=rds_host --port=5432 
> --username=user --dbname=dbname > output.txt
>
>
> * Profiling * 
>
> When profiling as shown here 
> https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
> I get results that DB lookups take most of the time while serializing is 
> not too fast either. As an example I have about 12k "Items" in the my local 
> PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and 
> majority of "Items" have also 1-2 "Photos". Fetching and serializing that 
> data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for 

Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Alceu Rodrigues de Freitas Junior
This may be very basic thing to ask, but there is no harm to double 
check: are you using select_related instead of all for those two 
specific Models?


Check out 
https://django-debug-toolbar.readthedocs.io/en/stable/index.html if you 
aren't using it yet, it will help you to get that very easily (I'm 
assuming you have a view to list all those instances for those Models).



Em 11/06/2017 13:15, Miika Huusko escreveu:

Thanks for response!

Yep, timing with "time" is not the best way to go to compare SQL query 
times. The reason I added that "time" test is that in Stack Overflow I 
was asked to confirm that data transfer is not factor here. I timed 
SQL queries with "EXPLAIN ANALYZE" and "\timing" first. That don't 
take data transfer into account, so, "time" as quick test that data 
transfer is not a problem.


About timing: as an example I can reduce the problem to serializing 
only Items and related Photos. It results only two queries. For 
example for a dataset of about 40k items results:


django.db.backends: (1.252) SELECT "todo_item"."version", *... all 
item properties ...* FROM "todo_item" WHERE ("todo_item"."project_id" 
= '...' AND "todo_item"."deleted_at" IS NULL);


django.db.backends: (0.883) SELECT "photos_photo"."version", *... all 
item properties ...* FROM "photos_photo" WHERE 
"photos_photo"."note_id" IN (349527, 349528, 349529, *... and rest of 
the 40k IDs ...* );



Quite simple quries. Timing shown in django.db.backends logs is what 
those queries take when executed manually with psql (1252 ms + 883 
ms). That results simple profiling info:


Database lookup   | 20.4447s
Serialization | 3.3821s
Django request/response   | 0.3419s
API view  | 0.1988s
Response rendering| 0.4591s

That's only from a single query and query times vary of course. Still, 
the difference between how long it takes query data from DB and how 
long Django process it is just that huge. The part I don't understand 
is that it takes about 20 seconds to run list(self.get_queryset()) 
while those two queries take about 2 seconds in SQL. There is some 
serious effort and time put there by Django. Those two queries are 
only queries that are run during list(self.get_queryset()) according 
to django.db.backend logs. "list" is there to force query execution to 
separate DB lookup time and serialization time.


Adding a new Model recovering information from a view/stored procedure 
on the database is a good idea. Of course, I would like to first 
understand what might be wrong in current models to not make same 
mistakes again. Is there something one should consider when making 
related items like Photos and Events related to Items? That use case 
is quite simple and still result Django to use 18 seconds to process 
SQL query response. There is lot of data of course, but I have thought 
that returning 50k objects should not be a problem for Python / Django 
even though ORM always adds some overhead.









--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/b25172ca-da2b-e645-cf70-f06b11329bba%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Melvyn Sopacua
On Sunday 11 June 2017 09:15:09 Miika Huusko wrote:
> Thanks for response!
> 
> Yep, timing with "time" is not the best way to go to compare SQL query
> times. The reason I added that "time" test is that in Stack Overflow
> I was asked to confirm that data transfer is not factor here. I timed
> SQL queries with "EXPLAIN ANALYZE" and "\timing" first. That don't
> take data transfer into account, so, "time" as quick test that data
> transfer is not a problem.
> 
> About timing: as an example I can reduce the problem to serializing
> only Items and related Photos. It results only two queries. For
> example for a dataset of about 40k items results:
> 
> django.db.backends: (1.252) SELECT "todo_item"."version", *... all
> item properties ...* FROM "todo_item" WHERE ("todo_item"."project_id"
> = '...' AND "todo_item"."deleted_at" IS NULL);
> 
> django.db.backends: (0.883) SELECT "photos_photo"."version", *... all
> item properties ...* FROM "photos_photo" WHERE
> "photos_photo"."note_id" IN (349527, 349528, 349529, *... and rest of
> the 40k IDs ...* );

Why did you opt to use prefetch_related and not select_related?

-- 
Melvyn Sopacua

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/24568739.Jy3x7NNMDG%40devstation.
For more options, visit https://groups.google.com/d/optout.


Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Miika Huusko
@Alceu Rodrigues de Freitas Junior, there are no too basic questions to ask 
:) I'm sure I'm missing something, so, please keep asking. But yes, I have 
considered that.

@Melvyn Sopacue, I opt to use prefetch_related because I'm fetching "Items" 
that are used as ForeignKey for "Photos" and "Events". Select_related is 
meant for cases where you have OneToOne of ForeignKey relation (single 
item) that you need to fetch. If I would be fetching "Photos" I would use 
selected_related to JOIN "Items" in SQL level. Now that I have fetching 
"Items" relations to "Photos" and "Events" are "reverse ForeignKey" 
Many-to-One sets and I need to use prefetch_related. Of course it means 
that JOIN is done in Python (that is probably what takes the time now), but 
I cannot see how I could do it with selected_related. Please let me know, 
if there's a way to utilize select_related with related sets.

I'm kind of hoping to find a problem in my implementation or thinking that 
would somehow make the Python level JOIN behave badly. It's a big JOIN, but 
still, Python is fast. There must be some kind of extra looping or cloning 
going on when performing Python level JOIN for prefetched Photos or other 
related sets.


On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote:
>
> I have a performance problem with Django + Django REST Framework. The 
> problem is that it takes Django very, very long time to handle SQL query 
> response + DRF serialization take quite some time also. It happens when 
> there are ManyToMany or OneToMany relations and nested objects. Sounds like 
> "N + 1" problem, but that's not the case. I have 10k to 50k items with 
> related items and I try to fetch and serialize them for REST API. Request 
> takes from 20 to 60 seconds and currently I have no idea what is causing 
> the slowness.
>
> Note: I asked the same question in Stack Overflow, but haven't found the 
> answer from there - 
> https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl
>
>
> * N+1 Problem * 
>
> There are not too many queries and queries themselves are fast (queries at 
> the end of this post). I'm using `prefetch_related` to limit number of 
> queries and what I'm seeing from DB queries everything is looking Okay..ish 
> (?). I get one query for each `prefetch_related` property + the original 
> query for serialized objects. There are lots and lots of IDs included in 
> `prefetch_related` queries, but I guess that is inevitable - as many IDs as 
> there are original items. 
>
> To test SQL queries + data transfer, I run same queries with psql from one 
> of my EC2 instances to RDS DB with same DB data and wrote the data to file. 
> Data transfer plus file write on top of that and it is totally between 100 
> to 500 ms for bigger SQL queries for different data sets. File write is 
> extra, but I wanted to ensure that I get all the data I expect.
>
> I tested timing from EC2 instance with command from like:
>
> time psql -f fat-query.psql --host=rds_host --port=5432 
> --username=user --dbname=dbname > output.txt
>
>
> * Profiling * 
>
> When profiling as shown here 
> https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
> I get results that DB lookups take most of the time while serializing is 
> not too fast either. As an example I have about 12k "Items" in the my local 
> PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and 
> majority of "Items" have also 1-2 "Photos". Fetching and serializing that 
> data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for 
> deployment and the timing is about the same there. On larger "Item" sets 
> serialization time is increasing more than DB lookup time, but DB lookups 
> always take most of the time. With 40k items you'll start to reach 1 min 
> execution time and different timeouts from Nginx and other parts of the 
> stack.
>
> Example with 12k items (models, serializers and queries below)
>
> Database lookup   | 14.0292s
> Serialization | 6.0076s
> Django request/response   | 0.3488s
> API view  | 0.2170s
> Response rendering| 1.1092s
>
> If I leave Photos and Events out the result is
>
> Database lookup   | 1.2447s
> Serialization | 3.9668s
> Django request/response   | 0.2435s
> API view  | 0.1320s
> Response rendering| 0.8495s
>
>
> * What might cause the slowness? *
>
> So, the related fields are taking most of the time (many=True). The 
> profiling I used for testing is making `list` out of queryset before 
> serializing. Therefore lazy queries are executed before serialization. If I 
> don't do that, it doesn't change the overall results, but DB lookups are 
> evaluated when serializing with about the same amount of time.
>
> Now the problem for me is that all queries that a

Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Melvyn Sopacua
On Sunday 11 June 2017 13:41:55 Miika Huusko wrote:

> @Melvyn Sopacue, I opt to use prefetch_related because I'm fetching
> "Items" that are used as ForeignKey for "Photos" and "Events".
> Select_related is meant for cases where you have OneToOne of
> ForeignKey relation (single item) that you need to fetch. If I would
> be fetching "Photos" I would use selected_related to JOIN "Items" in
> SQL level. Now that I have fetching "Items" relations to "Photos" and
> "Events" are "reverse ForeignKey" Many-to-One sets and I need to use
> prefetch_related.

You are in part.
But created_by is a foreign key on items, not the other way around.

So this:

> > django.db.backends: (0.001) SELECT "auth_user"."id", ...
> > everything
> > 
> > ... FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ... some IDs
> > ...);> 

can be avoided.

Last but not least, this is why API's use pagination. If for whatever reason 
you must provide 
such a complex model (instead of having the consumer make more API calls for 
photo's they 
don't have yet) you paginate, but it's not very flexible and scales bad.

The proper desgin pattern here would be to hash your photo's, use that as 
identifier in the API 
and put the burdon on the consumer to only request photo's they don't have the 
ID for yet.

This is for example, how Zendesk email attachments.
-- 
Melvyn Sopacua

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/3506371.52MvSNx3by%40devstation.
For more options, visit https://groups.google.com/d/optout.


Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Miika Huusko
@Melvyn Sopacua, yes, that is very true that auth_user SELECT can be 
avoided. Thanks for pointing that out! I don't know what I had put it like 
that. Unfortunately, it's the related sets and big JOINs that are the 
problem and generating the "slowness". 

The reason I'm providing all "Items" and other relates stuff with one 
request is that I want to make customer (that's also me and my good frieds) 
life easier. I want to keep all the difficult and heavy work on server 
side. Customer needs all data because of offline requirements and therefore 
would need to loop through pagination or perform the same big JOIN if 
"Photos" and other related objects would be requested separately. Doing 
that in mobile devices is going to take longer than on Ubuntu server even 
if it takes longer that I would expect with Python + Django. That said, 
it's good to point out opinions on design patterns. I haven't listed all 
background information for decisions.

I'm not quite sure what you mean by "hash photos and use that as 
identifier". What I could do - and believe what you mean(?) - is to provide 
list of "Photo" IDs (or hashes) for "Item" and ask customer to fetch 
"Photo" details separately with another request(s). Or - easier for API - 
leave "Photo" IDs out of the "Item" information completely and ask customer 
to make the JOIN with separately requested "Photos". JOIN could be done 
with hashes or IDs. Either way it would be customer who would make the JOIN.

I might go there, but still I don't believe that Django could not handle my 
meager data faster than 20 seconds :) If that would be the case it really 
would be sad.


On Sunday, June 11, 2017 at 4:57:22 PM UTC+2, Miika Huusko wrote:
>
> I have a performance problem with Django + Django REST Framework. The 
> problem is that it takes Django very, very long time to handle SQL query 
> response + DRF serialization take quite some time also. It happens when 
> there are ManyToMany or OneToMany relations and nested objects. Sounds like 
> "N + 1" problem, but that's not the case. I have 10k to 50k items with 
> related items and I try to fetch and serialize them for REST API. Request 
> takes from 20 to 60 seconds and currently I have no idea what is causing 
> the slowness.
>
> Note: I asked the same question in Stack Overflow, but haven't found the 
> answer from there - 
> https://stackoverflow.com/questions/44461638/django-django-rest-framework-postgresql-queries-and-serialization-is-very-sl
>
>
> * N+1 Problem * 
>
> There are not too many queries and queries themselves are fast (queries at 
> the end of this post). I'm using `prefetch_related` to limit number of 
> queries and what I'm seeing from DB queries everything is looking Okay..ish 
> (?). I get one query for each `prefetch_related` property + the original 
> query for serialized objects. There are lots and lots of IDs included in 
> `prefetch_related` queries, but I guess that is inevitable - as many IDs as 
> there are original items. 
>
> To test SQL queries + data transfer, I run same queries with psql from one 
> of my EC2 instances to RDS DB with same DB data and wrote the data to file. 
> Data transfer plus file write on top of that and it is totally between 100 
> to 500 ms for bigger SQL queries for different data sets. File write is 
> extra, but I wanted to ensure that I get all the data I expect.
>
> I tested timing from EC2 instance with command from like:
>
> time psql -f fat-query.psql --host=rds_host --port=5432 
> --username=user --dbname=dbname > output.txt
>
>
> * Profiling * 
>
> When profiling as shown here 
> https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ 
> I get results that DB lookups take most of the time while serializing is 
> not too fast either. As an example I have about 12k "Items" in the my local 
> PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and 
> majority of "Items" have also 1-2 "Photos". Fetching and serializing that 
> data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for 
> deployment and the timing is about the same there. On larger "Item" sets 
> serialization time is increasing more than DB lookup time, but DB lookups 
> always take most of the time. With 40k items you'll start to reach 1 min 
> execution time and different timeouts from Nginx and other parts of the 
> stack.
>
> Example with 12k items (models, serializers and queries below)
>
> Database lookup   | 14.0292s
> Serialization | 6.0076s
> Django request/response   | 0.3488s
> API view  | 0.2170s
> Response rendering| 1.1092s
>
> If I leave Photos and Events out the result is
>
> Database lookup   | 1.2447s
> Serialization | 3.9668s
> Django request/response   | 0.2435s
> API view  | 0.1320s
> Response rendering| 0.8495s
>
>
> * What might cause the slown

Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Alceu Rodrigues de Freitas Junior
I'm guessing here that REST would not be the appropriate to handle 
"large" amounts of data, and probably that's why Melevyn suggested to 
use pagination.


If your customer needs such data to be used off-line, I bet that would 
be better to generate a file in the appropriate format, compact it and 
redirect it to request the file to be downloaded.


For that, you could skip the ORM entirely, or build an appropriate Model 
for it. I would stick with the former (and export the data straight from 
the DB with a stored procedure), I don't see you reusing this new Model 
in other places in your Django app.



Em 11/06/2017 18:41, Miika Huusko escreveu:
@Melvyn Sopacua, yes, that is very true that auth_user SELECT can be 
avoided. Thanks for pointing that out! I don't know what I had put it 
like that. Unfortunately, it's the related sets and big JOINs that are 
the problem and generating the "slowness".


The reason I'm providing all "Items" and other relates stuff with one 
request is that I want to make customer (that's also me and my good 
frieds) life easier. I want to keep all the difficult and heavy work 
on server side. Customer needs all data because of offline 
requirements and therefore would need to loop through pagination or 
perform the same big JOIN if "Photos" and other related objects would 
be requested separately. Doing that in mobile devices is going to take 
longer than on Ubuntu server even if it takes longer that I would 
expect with Python + Django. That said, it's good to point out 
opinions on design patterns. I haven't listed all background 
information for decisions.


I'm not quite sure what you mean by "hash photos and use that as 
identifier". What I could do - and believe what you mean(?) - is to 
provide list of "Photo" IDs (or hashes) for "Item" and ask customer to 
fetch "Photo" details separately with another request(s). Or - easier 
for API - leave "Photo" IDs out of the "Item" information completely 
and ask customer to make the JOIN with separately requested "Photos". 
JOIN could be done with hashes or IDs. Either way it would be customer 
who would make the JOIN.


I might go there, but still I don't believe that Django could not 
handle my meager data faster than 20 seconds :) If that would be the 
case it really would be sad.









--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/db9cf163-6ddb-db87-e246-1151b180f9cd%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Django + Django REST Framework SQL query response handling and serialization is very, very slow - yet not a “N+1” problem

2017-06-11 Thread Melvyn Sopacua
On Sunday 11 June 2017 14:41:22 Miika Huusko wrote:

> The reason I'm providing all "Items" and other relates stuff with one
> request is that I want to make customer (that's also me and my good
> frieds) life easier. I want to keep all the difficult and heavy work
> on server side. Customer needs all data because of offline
> requirements and therefore would need to loop through pagination or
> perform the same big JOIN if "Photos" and other related objects would
> be requested separately. Doing that in mobile devices is going to
> take longer than on Ubuntu server even if it takes longer that I
> would expect with Python + Django. That said, it's good to point out
> opinions on design patterns. I haven't listed all background
> information for decisions.

Think about how Google Maps does this with Offline dataset for area's you've 
marked. They 
cache it for x days. Then prompt you if you have a free moment to sync up till 
you do.
It's not a burdon in practice and makes life a lot easier. Sometimes we set 
requirements too 
harsh for ourselves.

> I'm not quite sure what you mean by "hash photos and use that as
> identifier".

Sorry, I wasn't complete as it's been a while. I've built an application that 
fetches Zendesk data 
into Django to for trend analysis. To get an attachment, for a given message 
was:

- get the message id: /messages/id
- get the attachment id's: messages/id/attachments (this is your reverse 
related)
- for each one that you don't have yet, fetch it: /files/id

The hashing came into play, because corporate idiots - I mean people - attach 
their logo to each 
and every email they send. So by hashing them, you eliminate duplicates and 
scale down your 
storage requirements and (query) data size.

> What I could do - and believe what you mean(?) - is to
> provide list of "Photo" IDs (or hashes) for "Item" and ask customer
> to fetch "Photo" details separately with another request(s). Or -
> easier for API - leave "Photo" IDs out of the "Item" information
> completely and ask customer to make the JOIN with separately
> requested "Photos". JOIN could be done with hashes or IDs. Either way
> it would be customer who would make the JOIN.

But it makes a lot more sense now with the mobile and offline requirements, why 
you have 
setup the API the way you did. Two more thourghts for you:

- Use a cacheable dataset and incremental updates to be done "when wifi or 
better is available". 
Complex, but worth it in the long run.
- Ditch DRF serialization and get the querieset as values(), dumping to json 
instantly and thus 
eliminate both serialization objects and model objects.

It's easy to test the performance gain of the 2nd, especially if you can stream 
the json (as 
opposed to in-memory-then-flush).
-- 
Melvyn Sopacua

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/1751373.LEcVXiJZ3g%40devstation.
For more options, visit https://groups.google.com/d/optout.


Is this a bug on annotate+filter+annotate over M2M field.

2017-06-11 Thread Todor Velichkov
I believe issue #28297 is related to the problem described above.

https://code.djangoproject.com/ticket/28297

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/8b042519-82db-4916-9e9a-193e5f51f68d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.