[web2py:17363] Re: sharding counters

Robin B Mon, 02 Mar 2009 16:57:27 -0800

You must use transactions to atomically increment any counter shard.
Web2py does not have transactions yet, so just use google models for
the sharded counters, and increment/sum them inside/after form.accepts
().


Robin

On Mar 2, 6:11 pm, Jim <[email protected]> wrote:
> In the first message, I give a reference to google's implementation.
> I also foundhttp://highscalability.com/numbers-everyone-should-know
> helpful:
>
> "Sharded Counters
>
> We always seem to want to keep count of things. But BigTable doesn't
> keep a count of entities because it's a key-value store. It's very
> good at getting data by keys, it's not interested in how many you
> have. So the job of keeping counts is shifted to you.
>
> The naive counter implementation is to lock-read-increment-write. This
> is fine if there a low number of writes. But if there are frequent
> updates there's high contention. Given the the number of writes that
> can be made per second is so limited, a high write load serializes and
> slows down the whole process.
>
> The solution is to shard counters. This means:
> # Create N counters in parallel.
> # Pick a shard to increment transactionally at random for each item
> counted.
> # To get the real current count sum up all the sharded counters.
> # Contention is reduced by 1/N. Writes have been optimized because
> they have been spread over the different shards. A bottleneck around
> shared state has been removed.
>
> This approach seems counter-intuitive because we are used to a counter
> being a single incrementable variable. Reads are cheap so we replace
> having a single easily read counter with having to make multiple reads
> to recover the actual count. Frequently updated shared variables are
> expensive so we shard and parallelize those writes."
>
> On Mar 2, 3:27 pm, mdipierro <[email protected]> wrote:
>
> > I am lost here. I need to look up what "sharding" means.
>
> > Massimo
>
> > On Mar 2, 3:25 pm, Jim <[email protected]> wrote:
>
> > > Yes, but that gets me an exact match, right?
>
> > > If I've sharded my counters the way Google is recommending, I'll have
> > > something like this:
>
> > > name     count
> > > test0            3
> > > test1            2
> > > test2            3
> > > ...
> > > test19          2
>
> > > So I want to find all the records in my shards table that start with
> > > "test" and then total them.
>
> > > On Mar 2, 12:20 pm, mdipierro <[email protected]> wrote:
>
> > > > Because it is
>
> > > > counter = db(db.shards.name==shard_name).select()
>
> > > > ;-)
>
> > > > On Mar 2, 12:48 pm, Jim <[email protected]> wrote:
>
> > > > > Boiled down, my problem is this:
>
> > > > > I want to use
>
> > > > >         counter = db(db.shards.name=shard_name).select()
>
> > > > > but web2py won't allow it.   I get
>
> > > > > SyntaxError: keyword can't be an expression
>
> > > > > I'm trying to avoid writing a Google Query Language query here but
> > > > > maybe that's the right answer.
>
> > > > > On Mar 1, 9:55 pm, Jim <[email protected]> wrote:
>
> > > > > > I'm working towards implementing on Google App Engine.   Google
> > > > > > stresses the importance of sharding counters in their 
> > > > > > environment.http://code.google.com/appengine/articles/sharding_counters.html
>
> > > > > > Why?   Well, Craigslist keeps a counter for their ads.  Every time
> > > > > > someone posts an ad, that counter gets incremented.  As of a few 
> > > > > > weeks
> > > > > > ago, they released Sphinx as their search engine, which means that 
> > > > > > you
> > > > > > can use that 9-digit ad number in any city.
>
> > > > > > On GAE, writes are slow - you might have to wait a second or more to
> > > > > > write out one record.  Other records will have to wait for the first
> > > > > > one to finish.   You can't run Craigslist on GAE.
>
> > > > > > What does sharding counters mean?  As I interpret it, it means 
> > > > > > knowing
> > > > > > "about" how many records you have.  Not exactly.   So you get 10 or 
> > > > > > 20
> > > > > > sub-counters and write to one at random.   If you need to know about
> > > > > > how many records you have, you total the 10 or 20 sub-counters to 
> > > > > > get
> > > > > > an answer.  It's an approximate answer but if you've got a lot of
> > > > > > data, hey, it's going to be close enough.   And you can get 10 or 20
> > > > > > or 40 writes/second because each time you're grabbing a different
> > > > > > counter.
>
> > > > > > At least that's the theory.   I'm trying to translate the Google
> > > > > > implementation into web2py.  I've got increment working except for
> > > > > > memcache.  It's doing test0, test1, etc.  with the counts properly.
> > > > > > get_count is not working and I'm having trouble figuring out
> > > > > > why         counters = db(db.shards.name==name).select() is not
> > > > > > returning any results.   That seems to be the most accurate way to
> > > > > > translate GAE/webapp to web2py but my suspicion is it's failing
> > > > > > because test0 is not equal to test.   I'm trying to avoid using two
> > > > > > versions of routines - one for web2py w/o GAE, one with - but it 
> > > > > > might
> > > > > > be necessary.
>
> > > > > > def test_it():
> > > > > >     count=get_count('test')
> > > > > >     session.flash = count is + `count`
> > > > > >     increment('test')
> > > > > >     return
>
> > > > > > def get_count(name):
> > > > > >     """Retrieve the value for a given sharded counter.
>
> > > > > >     Parameters:
> > > > > >         name - The name of the counter
> > > > > >     """
> > > > > >     total = memcache.get(name)
> > > > > >     if total is None:
> > > > > >         print "none"
> > > > > >         total = 0
> > > > > >         counters = db(db.shards.name==name).select()
> > > > > >         for counter in counters:
> > > > > >             total += counter.count
> > > > > >             print counter.name
> > > > > >         memcache.add(name, str(total), 60)
> > > > > >     return total
>
> > > > > > def increment(name):
> > > > > >     """Increment the value for a given sharded counter.
>
> > > > > >     Parameters:
> > > > > >     name - The name of the counter
> > > > > >     """
>
> > > > > >     index = random.randint(0, NUM_SHARDS - 1)
> > > > > >     shard_name = name + str(index)
> > > > > >     try:
> > > > > >         counter = db(db.shards.name==shard_name).select()[0]
> > > > > >         temp=counter.count+1
> > > > > >         counter.update_record(count=temp)
> > > > > >     except:
> > > > > >         db.shards.insert(name=shard_name, count=1)
>
> > > > > > #    memcache.incr(name)
>
> > > > > > Eventually I hope to implement the version that allows for 
> > > > > > increasing
> > > > > > of the number of shards.
>
> > > > > > Thanks.
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:17363] Re: sharding counters

Reply via email to