[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Anthony Sat, 04 May 2013 06:07:24 -0700

It is also worth noting that when using SQLA in a web application, each 
SQLA session lasts for only a single web request. So, any benefits you may 
get from having the ORM maintain its own transaction-like session with 
identity-mapping, etc. are limited to the timescale of a single request. 
This approach is an improvement over the ActiveRecord style ORM, but it 
doesn't appear to offer much, if any, benefit over the web2py DAL, which 
does not suffer from the ActiveRecord problems.


For example, in SQLA, you might tout the ability to make multiple updates 
to a record during a session but have the changes deferred to a single 
database update. However, you can already do this in web2py by making 
multiple updates to a Row object and then calling .update_record() to send 
all the changes to the database in a single update. Of course, in web2py, 
if you obtain the record via different queries in different parts of the 
code, then you will get different Row objects (unlike in SQLA, which will 
ultimately give you back the same object), so you will have to make 
separate updates to the database for each change. However, this is also 
true in SQLA -- if you make a separate query that happens to retrieve a 
record you have previously updated in the session, the previous update will 
first be flushed to the database before the later query -- so you still end 
up with the same number of updates as in web2py.

Moreover, web2py has an additional advantage -- in web2py, if you happen to 
make a query in between making two changes to the same Row object, the 
first change to the Row object is not flushed to the database, so you can 
still limit yourself to a single db update for that Row. In SQLA, the 
intervening query would cause a flush, so you end up with more updates than 
you need. Yes, you can manually prevent the flush in SQLA if you know it 
will not do any harm, but then there's no benefit over the explicitness of 
the DAL.

Also, it's true that in SQLA, if you do two separate queries that happen to 
retrieve some of the same records, it will still only hold one copy of each 
unique record object in memory. But it still needs to pull the duplicate 
data from the database and therefore hold it in memory for some time before 
releasing it. On the other hand, in web2py you can update a record without 
first retrieving it from the database, which saves a database hit, memory, 
and processing time relative to a SQLA update.

Anthony

On Saturday, May 4, 2013 12:36:50 AM UTC-4, Anthony wrote:
>
>
> def a_child_was_born_in(countryName, cityName):
>>     city = db.Country(Name=countryName).City(Name=cityName).select().
>> first()
>>     city.update_record(Population=city.Population + 1)
>>
>> def a_person_has_died_in(countryName, cityName):
>>     city = db.Country(Name=countryName).City(Name=cityName).select().
>> first()
>>     city.update_record(Population-city.Population - 1)
>>
>  
> Technically, it would be db.Country(Name=countryName).City(db.City.Name== 
> cityName).select().first()
> .
>
> # In context 1:
>> a_child_was_born_in('France', 'Paris')
>> ...
>> # In context 2:
>> a_person_has_died_in('France', 'Paris')
>>
>> This would issue 4 round-trips to the database - 2 selects and 2 updates.
>>
>
> The way you have coded it, it is actually 6 round trips -- there are 2 
> selects per function -- one for the country, and a second for the city. But 
> that's not how you would do it in web2py anyway. Instead, you would issue 
> no selects and instead do it with just a single update -- so a total of 2 
> round trips to the db (i.e., 2 updates) :
>
> def a_child_was_born_in(countryName, cityName):
>     query = (db.City.Name == cityName) & (db.City.Country.belongs(db.
> Country.Name == countryName))
>     db(query).update(Population=db.City.Population + 1)
>
> So, in order to do the update, we do not first have to query the database 
> to retrieve the record. This is actually an advantage over the ORM, which 
> requires that you first retrieve the record before updating it. The ORM 
> will issue two queries to get the record if lazy loading is used, or one if 
> eager loading, a join, or a subquery is used. Furthermore, because web2py 
> doesn't need to retrieve the records, it also has a processing and memory 
> advantage over the ORM, which must create the record object, add it to the 
> session, and hold it in memory.
>  
>
>> Now, lets say we want to optimize that, so we do a "Lazy" version of 
>> those functions.
>>
>
> There's not much to optimize here. If you don't know ahead of time that 
> you will be making two updates to the same record (which may possibly 
> negate each other), I think the minimum number of db hits is two. You could 
> retrieve the record twice, defer the first update, recognize that the 
> second update cancels the first, and then make no update -- which is still 
> 2 hits (well, 1 hit if you cache the query). Or you could just make the 2 
> updates (as above). In any case, I believe the ORM actually requires a 
> minimum of 4 hits (see below), so web2py is still doing a lot better.
>  
>
>> Now, here is the same code, using an ORM:
>>
>> def a_child_was_born_in(countryName, cityName):
>>     city = Country(Name=countryName).City(Name=cityName)
>>     city.Population += 1
>>
>> def a_person_has_died_in(countryName, cityName):
>>     city = Country(Name=countryName).City(Name=cityName)
>>     city.Population -= 1
>>
>
> Assuming this is SQLA, I don't think that's quite the right syntax -- it 
> appears you are creating object instances rather than issuing queries. I 
> believe it should be something like this:
>
> def a_child_was_born_in(countryName, cityName):
>     city = session.query(City).join(Country)\
>            .filter(Country.Name == countryName)\
>            .filter(City.Name == cityName).first()
>     city.Population += 1
>
> The above does a join and therefore gets it down to a single query for the 
> select. Otherwise, you could just query for the country, then access the 
> "City" attribute, which would lazily issue a second query (though only when 
> the first function is called).
>
> The syntactic difference is small, but the semantic implication is 
>> profound.
>>
>
> Yes, but not quite in the way you think.
>  
>
>> The automatic cache-mechanism in the ORM will detect that we 
>> are querying the same record, and so would not query the database in the 
>> second function - just return the same object already in memory.
>>
>
> Again, assuming this is SQLA, that's not how it works. SQLA does not cache 
> queries -- when you run a query, it doesn't know what record will be 
> retrieved, so it doesn't know whether it already has the associated object 
> in the session. Hence, it will re-run the query both times. (The exception 
> to this is when you use .get() to fetch a record by primary key, which we 
> are not doing here.)
>  
>
>> But an ORM can have an "Identity Mapper", that would make sure they the 
>> same object would be returned,
>> It would be bound to two different name-spaces, but it would be the same 
>> object.
>> Now we could implement a "Truely" lazy update. The increment that is done 
>> in the first function, would be reflected in the second one, because the 
>> same object would be returned,
>>
>
> Another problem here. Whenever you execute a new query, SQLA flushes the 
> pending changes. So, when you run the query in the second function, it will 
> first issue the update to the database from the first change. Once it has 
> done that, it will ultimately also have to issue the update from the second 
> function (though perhaps at some later time) in order to have the correct 
> value in the database.
>
> So, I believe we have a minimum of 4 database hits with the ORM (5 if you 
> lazy load the cities when running the initial query) versus 2 hits with 
> web2py. We also have more processing and memory usage with the ORM.
>
> To summarize:
>
>    - The ORM doesn't do direct updates to the database, so it must first 
>    select the records before updating them, unlike web2py, which can issue a 
>    direct update.
>    - The ORM doesn't cache queries, so it must re-run the query twice, 
>    even though the record in question is already in the session after the 
>    first query.
>    - The ORM flushes pending changes before each query, so the first 
>    update goes to the database before the second query is run, ultimately 
>    necessitating both updates.
>    - The ORM must create the record object and manage various operations 
>    in the session, using more memory and processing time than web2py.
>
> Note, I am not a SQLA expert, so I may be mistaken about something above, 
> but this is how I understand it.
>
> These are the kinds of benefits an ORM may have.
>>
>
> One thing to keep in mind. Although the web2py DAL does not implement an 
> ORM design pattern, it is still object-oriented. We have Table, Field, Set, 
> Rows, Row, etc. These objects have various methods and attributes that 
> enable quite a bit of functionality and flexibility. I think most of the 
> behaviors you like so much in the SQLA ORM could in principle be 
> implemented within the DAL's objects (e.g., optional eager loading of 
> records within relationship attributes, or persistence upon first access 
> when lazily loaded). It may be worth thinking more about those ideas rather 
> than pushing the idea of a full-blown ORM layer.
>
> Anthony
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Reply via email to