Hey Ikai, That's what I was actually remembering, that a tablet would be around 100 or 200mb. I couldn't remember where I'd read that though -- thanks for the link.
Robert On Thu, Feb 2, 2012 at 22:06, Ikai Lan (Google) <[email protected]> wrote: > "A Bigtable cluster stores a number of tables. Each ta- > ble consists of a set of tablets, and each tablet contains > all data associated with a row range. Initially, each table > consists of just one tablet. As a table grows, it is auto- > matically split into multiple tablets, each approximately > 100-200 MB in size by default." > > Also: 100-200 MB is the default configuration. It might not be what we have > configured. This is *really really* detailed, though. > > -- > Ikai Lan > Developer Programs Engineer, Google App Engine > plus.ikailan.com > > > > On Thu, Feb 2, 2012 at 7:04 PM, Ikai Lan (Google) <[email protected]> wrote: >> >> Okay, looks like these whitepapers are at research.google.com now: >> >> http://research.google.com/archive/bigtable.html >> >> -- >> Ikai Lan >> Developer Programs Engineer, Google App Engine >> plus.ikailan.com >> >> >> >> On Thu, Feb 2, 2012 at 7:03 PM, Ikai Lan (Google) <[email protected]> >> wrote: >>> >>> Robert, I'll see what I can do. No promises on an ETA. It isn't in one of >>> the white papers? >>> >>> http://labs.google.com/papers/bigtable.html >>> >>> Oh what the heck ... the link is broken. Let me see what's up. >>> >>> -- >>> Ikai Lan >>> Developer Programs Engineer, Google App Engine >>> plus.ikailan.com >>> >>> >>> >>> On Thu, Feb 2, 2012 at 1:56 PM, Robert Kluin <[email protected]> >>> wrote: >>>> >>>> Yeah Ikai is completely correct. I should have noted more clearly >>>> that this is not something I even waste time worrying about until I >>>> think I'm actually hitting it, which is not often. In the few cases >>>> where I do think I've bumped into it, it is a writing thousands of >>>> entities per second type of thing -- which is not very common. >>>> >>>> It is interesting that sharding is determined by access patterns. Is >>>> that something you can elaborate on at all? ;) >>>> >>>> >>>> Robert >>>> >>>> >>>> >>>> On Thu, Feb 2, 2012 at 16:14, Ikai Lan (Google) <[email protected]> >>>> wrote: >>>> > Thanks for the answers, Robert. >>>> > >>>> > Shard size isn't determined by amount of data, but by access patterns. >>>> > An >>>> > example of an anti-pattern that will cause a shard size imbalance >>>> > would be >>>> > an entity write every time a user takes an action - but you never do >>>> > anything with this data. Since the data just kind of accumulates, the >>>> > shard >>>> > never splits (unless it hits some hardware bound, which I've never >>>> > really >>>> > seen happen yet with GAE data). >>>> > >>>> > As a final note, it takes a LOT of writes before this sort of thing >>>> > happens, >>>> > and I sometimes regret writing that blog post because anytime you >>>> > write a >>>> > blog post about scalability patterns, it invites people to prematurely >>>> > implement them (Brett Slatkin's video generated an endless number of >>>> > questions from people doing sub 1 QPS). We've done launches on the >>>> > YouTube/Google homepage >>>> > >>>> > (http://blog.golang.org/2011/12/from-zero-to-go-launching-on-google.html) >>>> > that haven't required us to make these changes because they did fine >>>> > under >>>> > load testing. I'd invest more energy in figuring out the right way to >>>> > load >>>> > test, then trying to figure out the bottlenecks when you hit limits >>>> > with >>>> > real data. >>>> > >>>> > -- >>>> > Ikai Lan >>>> > Developer Programs Engineer, Google App Engine >>>> > plus.ikailan.com >>>> > >>>> > >>>> > >>>> > On Wed, Feb 1, 2012 at 9:19 PM, Robert Kluin <[email protected]> >>>> > wrote: >>>> >> >>>> >> So I'd say don't worry about it unless you actually hit this problem. >>>> >> If you do know you'll hit it, see if you have a way to "shard" the >>>> >> timestamp, by account, user, or region, etc..., to relieve some of >>>> >> the >>>> >> pressure. If you must have a global timestamp, I'd say keep it as >>>> >> simple as possible, until you hit the issue. At that point you can >>>> >> figure out a fix. >>>> >> >>>> >> When I have timestamps on high write-rate entities that are >>>> >> non-critical, for example "expiration" times that are used only for >>>> >> cleanup, I'll sometimes add a random jitter of several hours to >>>> >> spread >>>> >> the writes out a bit. I'd be surprised if changing it by a few >>>> >> seconds helped much -- but it could. Keep in mind, there will >>>> >> already >>>> >> be some degree of randomness since the instance clocks have some >>>> >> slight variation. If you're hitting this issue, I'd give it a shot >>>> >> though. If it works it could at least buy you some time to get a >>>> >> better fix. >>>> >> >>>> >> I don't think there is a fixed number of rows per shard. I think it >>>> >> is split up by data size, and I don't think the exact number is >>>> >> publicly documented. Maybe you can roughly figure it out via >>>> >> experimentation. >>>> >> >>>> >> >>>> >> Robert >>>> >> >>>> >> >>>> >> On Wed, Feb 1, 2012 at 02:28, WGuerlich <[email protected]> wrote: >>>> >> > I know, I'm going to hit the write limit with a timestamp I need to >>>> >> > update >>>> >> > on every write and which needs to be indexed. >>>> >> > >>>> >> > As an alternative to sharding: What do you think about adding time >>>> >> > jitter to >>>> >> > the timestamp, that is, changing time randomly by a couple seconds? >>>> >> > In >>>> >> > my >>>> >> > application the timestamp being off by a couple senconds wouldn't >>>> >> > pose a >>>> >> > problem. >>>> >> > >>>> >> > Now what I need to know is: How many index entries can I expect to >>>> >> > go >>>> >> > into >>>> >> > one tablet? This is needed to estimate the amount of jitter >>>> >> > necessary to >>>> >> > avoid hitting the same tablet on every write. >>>> >> > >>>> >> > Any insights on this? >>>> >> > >>>> >> > Wolfram >>>> >> > >>>> >> > -- >>>> >> > You received this message because you are subscribed to the Google >>>> >> > Groups >>>> >> > "Google App Engine" group. >>>> >> > To view this discussion on the web visit >>>> >> > https://groups.google.com/d/msg/google-appengine/-/r0SVTq6i4iEJ. >>>> >> > >>>> >> > To post to this group, send email to >>>> >> > [email protected]. >>>> >> > To unsubscribe from this group, send email to >>>> >> > [email protected]. >>>> >> > For more options, visit this group at >>>> >> > http://groups.google.com/group/google-appengine?hl=en. >>>> >> >>>> >> -- >>>> >> You received this message because you are subscribed to the Google >>>> >> Groups >>>> >> "Google App Engine" group. >>>> >> To post to this group, send email to >>>> >> [email protected]. >>>> >> To unsubscribe from this group, send email to >>>> >> [email protected]. >>>> >> For more options, visit this group at >>>> >> http://groups.google.com/group/google-appengine?hl=en. >>>> >> >>>> > >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> > Groups >>>> > "Google App Engine" group. >>>> > To post to this group, send email to >>>> > [email protected]. >>>> > To unsubscribe from this group, send email to >>>> > [email protected]. >>>> > For more options, visit this group at >>>> > http://groups.google.com/group/google-appengine?hl=en. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Google App Engine" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]. >>>> For more options, visit this group at >>>> http://groups.google.com/group/google-appengine?hl=en. >>>> >>> >> > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
