QuerySet refactoring

Anssi Kääriäinen Tue, 12 Jun 2012 23:50:47 -0700

I am asking for directions about what to do about
django.db.models.sql.query (actually sql.*). I would like to refactor
the code in small incremental steps. However, this will bring internal
API breakages, and will likely add some more bugs temporarily.


While the ORM mostly works, it IMHO needs some polish. The reasons why
I see ORM refactoring as needed:
  - The code is too complex and underdocumented currently. Some
examples are query.add_q() and compiler.fill_related_selections().
  - Because of the above, adding new features is hard. There are some
long standing bugs which are hard to fix. There are 407 open ORM or
models.Model tickets, of which 247 are bugs).
  - I believe the ORM could be made faster. The ORM currently uses 4x
the time in Python compared to the time the DB needs to parse, plan
and execute a query like this: Model.objects.get(pk=1).

Why incremental rewrite instead of total rewrite? Total rewrite will
likely take so much time as to never actually get done. The underlying
structure of the ORM is good enough. There are things which would
likely be done in different way in total rewrite, but there isn't
anything blocker quality.

Why not use SQLAlchemy or some other ORM? I am no expert of
SQLAlchemy, but I believe it doesn't actually do the same thing as
Django's ORM. The complexity of Django's ORM comes from the need to
handle things like subqueries for negated multijoin lookups and
checking when to use LEFT JOIN, when INNER JOIN. SQLAlchemy doesn't do
that as far as I know.

I have no need to make the  ORM generic enough for no-SQL databases. I
don't believe generating a generic query.py class for no-SQL databases
is the correct approach. 80% of the code in query.py deal with joins,
null handling, subqueries and things like that. None of those would be
common to no-SQL DBs. Instead, no-SQL databases need to deal with
structured records, which isn't a problem for the SQL side of the ORM.
The common things are lookup handling and the API. The API is already
separate from sql.*. The lookup handling should be, too.

So, what I am trying to do? Things like:

https://code.djangoproject.com/ticket/16759 - patch:
https://github.com/akaariai/django/compare/ticket_16759
  - This is a performance improvement for query.clone() which should
alone make the ORM around 20% faster. This is definite DDN stuff, so
don't worry, I won't be committing this one currently.

https://code.djangoproject.com/ticket/17000 - patch:https://github.com/
django/django/pull/92, 
https://github.com/akaariai/django/compare/refactor_utils_tree
  - Make the utils.tree saner to use. Make query.add_q and
where.as_sql() cleaner. Fixes a couple of bugs.

https://code.djangoproject.com/ticket/16715 - somewhat old patch in
ticket.
  - Unify and fix join promotion logic.

There are more tickets somewhat ready in Trac.

The above are small steps to making the ORM easier to handle. The long
term goals at the moment are just cleanup. This cleanup should allow
new features (conditional aggregates, custom lookups etc), but is not
the immediate goal.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

QuerySet refactoring

Reply via email to