You must use transactions to atomically increment any counter shard. Web2py does not have transactions yet, so just use google models for the sharded counters, and increment/sum them inside/after form.accepts ().
Robin On Mar 2, 6:11 pm, Jim <[email protected]> wrote: > In the first message, I give a reference to google's implementation. > I also foundhttp://highscalability.com/numbers-everyone-should-know > helpful: > > "Sharded Counters > > We always seem to want to keep count of things. But BigTable doesn't > keep a count of entities because it's a key-value store. It's very > good at getting data by keys, it's not interested in how many you > have. So the job of keeping counts is shifted to you. > > The naive counter implementation is to lock-read-increment-write. This > is fine if there a low number of writes. But if there are frequent > updates there's high contention. Given the the number of writes that > can be made per second is so limited, a high write load serializes and > slows down the whole process. > > The solution is to shard counters. This means: > # Create N counters in parallel. > # Pick a shard to increment transactionally at random for each item > counted. > # To get the real current count sum up all the sharded counters. > # Contention is reduced by 1/N. Writes have been optimized because > they have been spread over the different shards. A bottleneck around > shared state has been removed. > > This approach seems counter-intuitive because we are used to a counter > being a single incrementable variable. Reads are cheap so we replace > having a single easily read counter with having to make multiple reads > to recover the actual count. Frequently updated shared variables are > expensive so we shard and parallelize those writes." > > On Mar 2, 3:27 pm, mdipierro <[email protected]> wrote: > > > I am lost here. I need to look up what "sharding" means. > > > Massimo > > > On Mar 2, 3:25 pm, Jim <[email protected]> wrote: > > > > Yes, but that gets me an exact match, right? > > > > If I've sharded my counters the way Google is recommending, I'll have > > > something like this: > > > > name count > > > test0 3 > > > test1 2 > > > test2 3 > > > ... > > > test19 2 > > > > So I want to find all the records in my shards table that start with > > > "test" and then total them. > > > > On Mar 2, 12:20 pm, mdipierro <[email protected]> wrote: > > > > > Because it is > > > > > counter = db(db.shards.name==shard_name).select() > > > > > ;-) > > > > > On Mar 2, 12:48 pm, Jim <[email protected]> wrote: > > > > > > Boiled down, my problem is this: > > > > > > I want to use > > > > > > counter = db(db.shards.name=shard_name).select() > > > > > > but web2py won't allow it. I get > > > > > > SyntaxError: keyword can't be an expression > > > > > > I'm trying to avoid writing a Google Query Language query here but > > > > > maybe that's the right answer. > > > > > > On Mar 1, 9:55 pm, Jim <[email protected]> wrote: > > > > > > > I'm working towards implementing on Google App Engine. Google > > > > > > stresses the importance of sharding counters in their > > > > > > environment.http://code.google.com/appengine/articles/sharding_counters.html > > > > > > > Why? Well, Craigslist keeps a counter for their ads. Every time > > > > > > someone posts an ad, that counter gets incremented. As of a few > > > > > > weeks > > > > > > ago, they released Sphinx as their search engine, which means that > > > > > > you > > > > > > can use that 9-digit ad number in any city. > > > > > > > On GAE, writes are slow - you might have to wait a second or more to > > > > > > write out one record. Other records will have to wait for the first > > > > > > one to finish. You can't run Craigslist on GAE. > > > > > > > What does sharding counters mean? As I interpret it, it means > > > > > > knowing > > > > > > "about" how many records you have. Not exactly. So you get 10 or > > > > > > 20 > > > > > > sub-counters and write to one at random. If you need to know about > > > > > > how many records you have, you total the 10 or 20 sub-counters to > > > > > > get > > > > > > an answer. It's an approximate answer but if you've got a lot of > > > > > > data, hey, it's going to be close enough. And you can get 10 or 20 > > > > > > or 40 writes/second because each time you're grabbing a different > > > > > > counter. > > > > > > > At least that's the theory. I'm trying to translate the Google > > > > > > implementation into web2py. I've got increment working except for > > > > > > memcache. It's doing test0, test1, etc. with the counts properly. > > > > > > get_count is not working and I'm having trouble figuring out > > > > > > why counters = db(db.shards.name==name).select() is not > > > > > > returning any results. That seems to be the most accurate way to > > > > > > translate GAE/webapp to web2py but my suspicion is it's failing > > > > > > because test0 is not equal to test. I'm trying to avoid using two > > > > > > versions of routines - one for web2py w/o GAE, one with - but it > > > > > > might > > > > > > be necessary. > > > > > > > def test_it(): > > > > > > count=get_count('test') > > > > > > session.flash = count is + `count` > > > > > > increment('test') > > > > > > return > > > > > > > def get_count(name): > > > > > > """Retrieve the value for a given sharded counter. > > > > > > > Parameters: > > > > > > name - The name of the counter > > > > > > """ > > > > > > total = memcache.get(name) > > > > > > if total is None: > > > > > > print "none" > > > > > > total = 0 > > > > > > counters = db(db.shards.name==name).select() > > > > > > for counter in counters: > > > > > > total += counter.count > > > > > > print counter.name > > > > > > memcache.add(name, str(total), 60) > > > > > > return total > > > > > > > def increment(name): > > > > > > """Increment the value for a given sharded counter. > > > > > > > Parameters: > > > > > > name - The name of the counter > > > > > > """ > > > > > > > index = random.randint(0, NUM_SHARDS - 1) > > > > > > shard_name = name + str(index) > > > > > > try: > > > > > > counter = db(db.shards.name==shard_name).select()[0] > > > > > > temp=counter.count+1 > > > > > > counter.update_record(count=temp) > > > > > > except: > > > > > > db.shards.insert(name=shard_name, count=1) > > > > > > > # memcache.incr(name) > > > > > > > Eventually I hope to implement the version that allows for > > > > > > increasing > > > > > > of the number of shards. > > > > > > > Thanks. > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py Web Framework" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---

