[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Arnon Marcus Tue, 30 Apr 2013 10:10:27 -0700

On Tuesday, April 30, 2013 3:42:23 PM UTC+3, Anthony wrote:

> This is nice, but you're still talking about very general features. It 
> would be more helpful if you could explain why we need those features. 
> Perhaps you could show in SQLA how you would specify and interact with a 
> particular data model, and then explain why you think that is better/easier 
> than how you would do it using the DAL.



Well, I propose adding a set of tools for building business-model classes 
on-top of the DAL.
I basically mean implementing a 'Unif-Of-Work', and an 'identity-map' as 
explained in the link(s) I provided.
Now, what does this mean...

Well, here is what it does NOT mean - it should NOT lead the developer to 
implement an Active-Record model.
Class-attributes representing records, should NOT 'save()' whenever you set 
them, and should NOT *automatically* 'load()' whenever you access them.
That said, the developer should also NOT have to manage the 
'order-of-operations' by himself - this should be automated (read: 
"auto-inferred from the schema').

Instead, when an attribute is being set, it should marks itself as 'dirty', 
and would be 'pushed' to the database later.
There should also be a way to tell the class-attributes, which other 
attributes of which other classes would be affected by it being modified 
(SQLA calls it the 'relationship-object').
Then, there should be a procedure that uses this information, and while keeping 
track of what's going on inside the transaction, whenever a record is 
changed somehow, it 'discovers' which other records participating in the 
current transaction, might be effected by that change to that attribute, 
and invalidates their caches.
The idea is that there is a mechanism that makes sure that if there are 
pending-changes of other attributes that needs to be pushed 'before' the 
attribute-being accessed is valid, then the push-operations are executed 
before the attribute refreshes itself. This is done by linking events 
between attributes, and propagating-them on attribute-access (SQLA calls 
this a 'cascade').
** The process of marking relationships may be further automated even 
beyond what exists in SQLA, by analyzing the db-schema itself, and 
'inferring' relationships bi-directionally, and wiring all the events 
automatically at class-instantiation-time.*
If there are no such pending-changes, than the attribute being accessed 
should be assumed to be valid (so it's "caches" value is returned), unless 
it was previously invalidated by a transaction-commit. Every 
transaction-commit should invalidate all existing records that are in 
memory.

This is basically the 'unit-of-work' pattern (if I understood correctly)
** But you should really watch the lectures I posted - they probably 
explain this much better than I did...*

"What are the benefits?" You should ask?

Well, there are performance-improvements with the caching mechanism, and 
with the aggregation of operations - certain change-operations are only 
pushed to the transaction-view, when they are actually needed, or at the 
end of the transaction. There may also be consistency-benefits, by insuring 
correct order-of-operations.
Lastly, the main benefit is automatic-handling of caching and ordering or 
operations, that the developer no-longer needs to take care of himself.
The benefit is not just for simple-data-modes, on the contrary - the more 
complex the data-mode, the more this becomes beneficial, as the 
automatic-detection of relationship-dependencies, and auto-cascade of 
operations, can both save brain-cycles from the developer trying to hold 
the whole schema in his head and make sure he pushes things in the correct 
order, and also prevent human-errors that can mess-up the database in cases 
that constrains are insufficient, and the developer overlooked some 
relationship-dependencies and was using stale-data without knowing it.

Then there is the 'identity-map':
This is a complementary-needed feature, to maintain consistency across 
results of the same records taken in different places in the code within 
the same transaction.
It makes sure that there can be one-and-only-one instance or a record 
in-memory withing a single transaction of a single db-session.

"What are the benefits here?" You ask?

First of all, there's a consistency issue that can arise without such a 
system, if the developer is manually controlling the order-of-operations, 
and makes a mistake.
This is not an issue that is exclusive for Active-Record patterns - it can 
also happen in web2py's DAL.
For example, lets take the following code:

def setItemName(id, name):
    row = db.item[id].select()
    row.update_record(name=name)

def getItemByName(name):
   return db.item(db.item.name==name).select()

id = 42
...
row = db.item[id].select()
row.name='some name' # or row[name]='some name' or row.update(name='some 
name')
... 
def setItemName(id, 'some other name')
....
getItemByName(row.name)

As you can see, the last function-call would either fail, or silently 
return something other than what was asked for.
The reason is that there is no singleton of a record that was given back.
The function that pushed the item's name-change to the database, did not 
affect the row-object that represented that same record outside.

The 'identity-map' solves this by storing all rows received for a given 
transaction, in a thread-local-global variable, and re-uses it
based on it's identity (table-name + field-name + primary-key). This also 
has some memory-benefits, especially for large queries.
** BTW: SQLA-ORM is doing this, Django-ORM is not...*
This feature can be implemented within the DAL itself, but does not have to 
be. As long as the db-connection is paired with a thread-local pool of 
records, and that this pool is invalidated on each commit, things with work 
fine. But it's an obvious-fit for the db-connection object, as it can use 
it as a cache internally, and can invalidate it's pool internally on 
'commit()' calls.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Reply via email to