Re: [web2py] DAL speed - an idea

Anthony Thu, 09 Feb 2012 13:14:49 -0800

I've been thinking about something like this as well. Instead of a separate 
select_raw() method, maybe we can just add a raw=True|False argument to the 
existing select() method. I like the namedtuple idea as well (I think some 
adapters already provide that as an option -- e.g., psycopg2).


Anthony

On Thursday, February 9, 2012 3:04:41 PM UTC-5, nick name wrote:
>
> Yes, that is the basis of what I am suggesting.
>
> There is not currently such a thing; there is something called 
> 'select_raw' implemented in the GoogleDataStore adapter, but not in 
> anything else, and it isn't exactly what I am proposing.
>
> To elaborate:
>
> Assume the table is defined as follows:
>    
>     reftable = db.define_table('reftable', Field('a', string))
>     table = db.define_table('table', Field('b', reftable))
>
> In my case, I need to pull all the records (60,000) from the database to 
> compute some aggregation which I cannot compute using sql. There are two 
> alternatives here:
>
>     r1 = db().select(table.ALL) # takes > 6 seconds
>
>     r2 = db.executesql(db._select(table.ALL)) # takes ~0.1sec
>
> The records returned in the first instance are much richer; they have 
> record chasing (e.g. I can do r1[0].b.a to select through the foreign key), 
> they have methods like r1[0].update_record() and r1[0].delete_record(), and 
> other nice stuff.
>
> However, for this use, I don't need the additional records, and I do need 
> the speed, so I would rather use r2. However, r2 is not a direct 
> replacement -- it doesn't have the column names. If I use
>
>     r3 = db.executesql(db._select(table.ALL), as_dict=True) # still takes 
> ~0.1sec
>
> I can do r3[0]['b'] but I cannot do r3[0].b; and it takes a lot more 
> memory than r2.
>
> A suggestion: add another parameter, processor=... which, if available, 
> will be called with the db.connection.cursor, returning a function, through 
> which each routine will be passed; example
>
> def named_tuple_process(name, description):
>    from collections import namedtuple
>    fields = ' '.join([x[0] for x in description])
>    return namedtuple(name, fields)
>
>     r4 = db.executesql(db._select(table.ALL), process=lambda x: 
> named_tuple_process('tablerec', x))
>
> r4[0].b # will now work; not a full replacement, but good enough for many 
> uses.
>
> In fact, you can do that externally - 
>
> r4 = db.executesql(db._select(table.ALL))
> f = named_tuple_process('tablerec', db._adapter.cursor.description)
> r4 = [f(x) for x in r4]
>
> But this requires reaching into the internals of the db adapter.
>
> Finally, I propose to define x.raw_select(*args) to do: 
> db.executesql(x._select(*args))
>
> which would make this a relatively clean replacement.
>

Re: [web2py] DAL speed - an idea

Reply via email to