On Thu, Dec 27, 2012 at 11:29 AM, Donald Stufft <[email protected]>wrote:

> On Wednesday, December 26, 2012 at 10:00 PM, Russell Keith-Magee wrote:
>
> Why? Because we've gone to extraordinary lengths to make sure this sort of
> thing is at least theoretically possible.
>
> Although we use the term "ORM", and there's currently only relational
> implementations of Django's ORM, there's nothing relational about the
> Django ORM API. We've very deliberately posed the API in terms of functions
> you want to perform on objects:
>
>  * Get me the author named "Douglas Adams"
>  * Get a list of books that are more than 3 years old
>  * Update the login counter on this user by one.
>
>  Because except for very simple models you will not be able to sanely take
> a model written for a Relational database and switch it to a NonRelational
> database. If you cannot provide the same sort of mostly transparent
> switching like the ORM provides for MySQL -> PostgreSQL -> Oracle then
> there is little benefit in keeping it within the same system.
>
> All of your examples work on simple models sure. What about:
>
>     * Anything requiring a join, explicitly with select_related() or
> implicitly with __ magic.
>     * select_for_update
>     * No standard way of handling "Related" fields (Do you Inline them?
> Mimic a ForeignKey?)
>     * The entire transaction system on systems without transactions
>
> There is also the problem of vastly different access patterns,
> assumptions, and performance characteristics.
>
>     * Getting a list of Books that are more than 3 year old is a very
> simple operation in SQL with very predictable performance, getting a list
> of books older than 3 years old if they are stored in Redis, less so.
>     * Systems that depended on a unique=True enforcing a constraint of
> uniqueness no longer happening.
>     * index=True becoming NO-OP.
>     * A simple Join potentially goes from an inexpensive operation to one
> that requires traversing several million rows with horrible performance.
>
> The access patterns, assumptions of functionality, and assumption of
> performances are so different between even the different NoSQL solutions,
> much less the various NoSQL solutions and Relational databases that either
> you're going to have second class citizens (Sure you can use X system with
> Django models, but only as a competely segregated unit and you can't touch
> [a list of features]), or you're going to need to limit the features down
> to a subset that all databases can support (We already have this problem
> with PostgreSQL vs MySQL vs SQLite, it will be tenfold if we include NoSQL
> databases). In order to actually use the power of your datastore you need
> to use a class of "ORM" that is designed to work within it's access
> patterns.
>
> Django as a whole should be avoiding giving people footguns, and
> attempting to shove NonRelational databases into the ORM will be providing
> a massive footgun. As soon as it happens you'll have a whole host of people
> attempting to run apps and sites that depended on things that relational
> databases assured suddenly having it yanked out from underneath them and it
> will be Django's fault for providing that footgun.
>

That depends entirely on what you consider the goal of the ORM to be.

You have assumed that the goal would be "allow an arbitrary query to run on
any underlying data store, and run with equivalent efficiency". In this
model, you could take your fully operational Django PostgreSQL project, and
roll it out under MongoDB (or any other supported store), and it would
Magically Work™.

I completely agree that this is a completely unrealistic goal, and would,
as you rightly point out, constitute a high-calibre footgun.

However, there's another way of looking at it. You're focussing on the ORM
as a query generation engine. Of more interest is the ORM as a metadata
layer for models in a data store, with some basic reliable querying
features.

Think of it this way -- the goal isn't to allow an arbitrary query to run
on any data store. The goal is to allow Django's admin to operate on a
model in any data store, or to allow a Django ModelForm to retrieve and/or
store an object in any datastore.

The queries required to support Django's admin and/or ModelForms are all
inherently simple CRUD operations -- operations that have simple (and for
the most part, efficient) analogs in every data store.

Any non-trivial query will *always* require an understanding of the
underlying data storage. The ORM is an abstraction, and while it can make
certain queries easier to write, you can't use it in a vacuum -- you have
to be aware of the SQL that is being generated. And sometimes, you need to
fall back to raw SQL to get the job done.

I don't see a non-relational backend to Django's ORM being any different.
We can make simple retrieval operations easy. But there's no way we can
automatically optimise queries for every possible data store -- at some
point, a brain will need to be engaged in the process, and purpose-built
optimisations will need to be developed. Similarly, just because there's an
efficient relational representation of a data concept (e.g., a foreign
key), doesn't mean there's an equally efficient non-relational
representation.

Interestingly, your arguments about the complications of switching from a
relational data store to a non-relational store apply equally to switching
between different relational stores. Just because you have a project
running under MySQL, doesn't mean you can just change the database backend
and have it run under PostgreSQL. Django will make certain aspects of the
transition easy (i.e., the easy queries), but if you've done any sort of
query or index optimisation, or you're relying on transaction behaviour,
this switch will be equally problematic. However, the bit you *won't* have
to worry about is the basic, out of the box Admin interface, and the
behaviour of metadata inspecting Django features like the forms library.

So - what I'm talking about here isn't some magical abstraction layer that
makes the choice of data store irrelevant. It's a way to make simple things
simple, and complex things possible. It's about making it *possible* to
wrap a Django form around a MongoDB object. It's about making it possible
to display those objects in Django's admin. Yes, some features of the ORM
will be lost. Some will be inefficient. And there will almost certainly be
some non-relational operations that aren't available on relational stores,
and vice versa.

Finally, I'd also point out that a lot of this sort of analysis and
discussion has been covered in the past. The hubbub about non-relational
stores has been around for a while, and the relationship between noSQL and
Django has been discussed a lot on mailing lists, and at conferences. If
you're interested in the topic, it's worth seeing what has been said in the
past.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to