Re: [web2py] Re: tags and speed

andrej burja Fri, 19 Oct 2012 13:06:10 -0700

should i use

 .select(..., cacheable=True)


or 

cache.disk('games',lambda:games

what is the difference?

andrej


On Friday, October 19, 2012 1:20:08 PM UTC+2, andrej burja wrote:
>
> hi
>
> Thank you for your explanation. It is really helpful  
> The address of the chapter in the book is "Efficient search by tag" and i 
> choose this approach because i thought that search would be the most 
> important thing of the application (that was right)
> The table tag is similar like in your example 2, but only one tag per row, 
>  so: 
> game_id  tag
> 1             excitement
> 1             funny
>
> I learned some lessons:
> - do usability testing ass soon as possible (it came as a surprise for me 
> that using Datatables is the most useful)
> - be prepared for lots of data: my initial assumption was 200 games, and 
> now i have more than 1000.
>
> I will try with cache first. 
>
> andrej
>
>
> On Friday, October 19, 2012 1:21:37 AM UTC+2, Niphlod wrote:
>>
>> old mean bad thing about normalized vs denormalized model.
>> Don't know what you read in the cookbok, but here's the thing.
>> You have a game pacman that is tagged as arcade, and a game invaders 
>> tagged as horror
>>
>> First things first: you may want to change "horror" to "needs parent 
>> around" later in the future. That's why you may want to create an external 
>> tags table and reference it in your games one.
>> tags table
>> 1 arcade
>> 2 horror
>>
>> game_name tags
>> pacman   1
>> invaders  2
>>
>> To display
>> pacman arcade
>> invaders horror
>> you just need a simple join. Changing record 2 of the tags table allows 
>> you to have
>> pacman arcade
>> invaders needs parents around
>>
>> Next problem on the line: you want multiple tags for a single game 
>> (invaders needs both arcade and horror tags). 
>> Welcome to normalization hell. Books have been written about it in the 
>> last 60-70 years, with terms like 3NF and Cardinality all around ^_^
>>
>> Let's take this by examples
>>
>> 1) you change the tags table like this
>> id tag_name game_id
>> 1 arcade 1
>> 1 arcade 2
>> 2 horror 2
>> getting tags for a single game is fast, change "horror" to "needs parents 
>> around" is fast, getting all possible tags is enough fast (depending on the 
>> number of rows of the tags table)
>> but
>> fetch the right tags for every game can be slow
>>
>> 2) you create a games table like this 
>> id game_name tags
>> 1 pacman |arcade|
>> 2 invaders |arcade|horror|
>> getting tags for a single game is fast, updating a tag for a game is 
>> quite fast
>> but
>> changing all "horror" to "needs parents around" can be fast only if done 
>> outside web2py with a manual replace, getting all possible tags can be slow
>>
>> 3) you create a games table like this
>> id game_name tags_id
>> 1 pacman |1|
>> 2 invaders |1|2|
>> getting tags for a single game is fast, updating a tag for a game is 
>> quite fast, changing "horrors" is fast
>> but
>> changing all "horror" to "needs parents around" can be fast only if done 
>> outside web2py with a manual replace, getting all possible tags can be slow
>>
>> So, with no need to change tag names from "horror" to "needs parents 
>> around", I'd say the right way for displaying your table is 2) 
>> (list:string), else 3) (list:reference). 
>> If you need to "suggest" previously entered tags (i.e. you don't want to 
>> end having "arcade", "arcady", "arcadian", "'rcade" spat all around), I'd 
>> still go for 2) + one separate table (or a cached list) holding all 
>> previously entered tags, just to speedup the "suggestion" phase (hoping 
>> users won't screw up).
>>  
>> Method 1) is faster only when:
>> - tags are more than the games
>> - "cleaning normalization" logics are heavy
>> - you need to update tag names often
>> - the only need is displaying tags for a single game (direct query on a 
>> single table, the tags one)
>> - you need to compute something like "how many times the arcade tag is 
>> applied to all my games" (i.e. for a tag cloud)
>>
>> There are several other methods, and all of them in the end "do the job". 
>> You just need to see what are your requirements and choose carefully. 
>> Your agenda is:
>> - spend the less time possible with queries you have to invoke often
>> - "pay the price" of your model in high computations for other things 
>> (or, let's save the results of those computations externally once every 
>> hour)
>>  
>>
>>
>>

--

Re: [web2py] Re: tags and speed

Reply via email to