My web app hits the database up to 3k per page because it has to retrieve data from lots of related models. It has a fairly normalised database with a structure similar to:
- one project has many contracts - one project has many tenders (a tender is put forward by a contractor) - one contract has many jobs (there are a few extra relations not shown for simplicity). One of the pages, for example, lists out details for every contract a contractors has tendered). I've naively implemented this by treating the Django models as I would OO models, resulting in about 3k database hits for the one page. This is because something like: qs = Tender.objects.filter(filter_criteria) for tender in qs: print tender.getProject() where getProject() looks something like: def getProject(self): return self.project I think here that the extra level of indirection means that Django can't conclude that all 'project' objects for each tender from the query set should be retrieved, and instead it ends up hitting the database once per loop. At the moment I do this in quite a few places, contributing to the 3k database hits. One way of of solving this is for the query to fetch all related data, something like: qs = Tender.objects.sql_related(/*names of keyed relationships here*/).prefetch_related('project', 'project__job_set') (the actual example has many more relations in the prefetch_related). This fixes the problem of hitting the database in loops such as the above, and the complexity can be hidden in a data retrieval layer that the client calls into. returnedTenders = getTendersForContractor(contractor) # << the actual query with prefetches etc is done in this function However, there are a few drawbacks to this: * Major one is that data is fetched that you might not require. EG: if there was another page that just listed the ids of the Tenders, then a lot of data would have been fetched that is not required ** you could mitigate this by having the client somehow specify which data requires prefetching, but *** this exposes implementation detail to the client *** the function calling into the data retrieval layer may not know what data requires prefetching, and instead it is a function higher up in the call chain (the function that actually uses the data) that has this information * Each prefetch_related hits the database once - a lot better than 3k hits, but not as minimal as a sql join which joins all of the tables * Prefetch_related selects related obects via the criteria: 'in (list of ids'), which arguably can be slower * Seems brittle - if you change the data model, you have to change all of the queries in the data retrieval layer. Denomralising the data would reduce this, but we'd still have the problem of retrieving more data than the client may possibly want What is the group's suggestions/advice? -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.