"A Bigtable cluster stores a number of tables. Each ta- ble consists of a set of tablets, and each tablet contains all data associated with a row range. Initially, each table consists of just one tablet. As a table grows, it is auto- matically split into multiple tablets, each approximately 100-200 MB in size by default."
Also: 100-200 MB is the default configuration. It might not be what we have configured. This is *really really* detailed, though. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com On Thu, Feb 2, 2012 at 7:04 PM, Ikai Lan (Google) <[email protected]> wrote: > Okay, looks like these whitepapers are at research.google.com now: > > http://research.google.com/archive/bigtable.html > > -- > Ikai Lan > Developer Programs Engineer, Google App Engine > plus.ikailan.com > > > > On Thu, Feb 2, 2012 at 7:03 PM, Ikai Lan (Google) <[email protected]>wrote: > >> Robert, I'll see what I can do. No promises on an ETA. It isn't in one of >> the white papers? >> >> http://labs.google.com/papers/bigtable.html >> >> Oh what the heck ... the link is broken. Let me see what's up. >> >> -- >> Ikai Lan >> Developer Programs Engineer, Google App Engine >> plus.ikailan.com >> >> >> >> On Thu, Feb 2, 2012 at 1:56 PM, Robert Kluin <[email protected]>wrote: >> >>> Yeah Ikai is completely correct. I should have noted more clearly >>> that this is not something I even waste time worrying about until I >>> think I'm actually hitting it, which is not often. In the few cases >>> where I do think I've bumped into it, it is a writing thousands of >>> entities per second type of thing -- which is not very common. >>> >>> It is interesting that sharding is determined by access patterns. Is >>> that something you can elaborate on at all? ;) >>> >>> >>> Robert >>> >>> >>> >>> On Thu, Feb 2, 2012 at 16:14, Ikai Lan (Google) <[email protected]> >>> wrote: >>> > Thanks for the answers, Robert. >>> > >>> > Shard size isn't determined by amount of data, but by access patterns. >>> An >>> > example of an anti-pattern that will cause a shard size imbalance >>> would be >>> > an entity write every time a user takes an action - but you never do >>> > anything with this data. Since the data just kind of accumulates, the >>> shard >>> > never splits (unless it hits some hardware bound, which I've never >>> really >>> > seen happen yet with GAE data). >>> > >>> > As a final note, it takes a LOT of writes before this sort of thing >>> happens, >>> > and I sometimes regret writing that blog post because anytime you >>> write a >>> > blog post about scalability patterns, it invites people to prematurely >>> > implement them (Brett Slatkin's video generated an endless number of >>> > questions from people doing sub 1 QPS). We've done launches on the >>> > YouTube/Google homepage >>> > ( >>> http://blog.golang.org/2011/12/from-zero-to-go-launching-on-google.html) >>> > that haven't required us to make these changes because they did fine >>> under >>> > load testing. I'd invest more energy in figuring out the right way to >>> load >>> > test, then trying to figure out the bottlenecks when you hit limits >>> with >>> > real data. >>> > >>> > -- >>> > Ikai Lan >>> > Developer Programs Engineer, Google App Engine >>> > plus.ikailan.com >>> > >>> > >>> > >>> > On Wed, Feb 1, 2012 at 9:19 PM, Robert Kluin <[email protected]> >>> wrote: >>> >> >>> >> So I'd say don't worry about it unless you actually hit this problem. >>> >> If you do know you'll hit it, see if you have a way to "shard" the >>> >> timestamp, by account, user, or region, etc..., to relieve some of the >>> >> pressure. If you must have a global timestamp, I'd say keep it as >>> >> simple as possible, until you hit the issue. At that point you can >>> >> figure out a fix. >>> >> >>> >> When I have timestamps on high write-rate entities that are >>> >> non-critical, for example "expiration" times that are used only for >>> >> cleanup, I'll sometimes add a random jitter of several hours to spread >>> >> the writes out a bit. I'd be surprised if changing it by a few >>> >> seconds helped much -- but it could. Keep in mind, there will already >>> >> be some degree of randomness since the instance clocks have some >>> >> slight variation. If you're hitting this issue, I'd give it a shot >>> >> though. If it works it could at least buy you some time to get a >>> >> better fix. >>> >> >>> >> I don't think there is a fixed number of rows per shard. I think it >>> >> is split up by data size, and I don't think the exact number is >>> >> publicly documented. Maybe you can roughly figure it out via >>> >> experimentation. >>> >> >>> >> >>> >> Robert >>> >> >>> >> >>> >> On Wed, Feb 1, 2012 at 02:28, WGuerlich <[email protected]> wrote: >>> >> > I know, I'm going to hit the write limit with a timestamp I need to >>> >> > update >>> >> > on every write and which needs to be indexed. >>> >> > >>> >> > As an alternative to sharding: What do you think about adding time >>> >> > jitter to >>> >> > the timestamp, that is, changing time randomly by a couple seconds? >>> In >>> >> > my >>> >> > application the timestamp being off by a couple senconds wouldn't >>> pose a >>> >> > problem. >>> >> > >>> >> > Now what I need to know is: How many index entries can I expect to >>> go >>> >> > into >>> >> > one tablet? This is needed to estimate the amount of jitter >>> necessary to >>> >> > avoid hitting the same tablet on every write. >>> >> > >>> >> > Any insights on this? >>> >> > >>> >> > Wolfram >>> >> > >>> >> > -- >>> >> > You received this message because you are subscribed to the Google >>> >> > Groups >>> >> > "Google App Engine" group. >>> >> > To view this discussion on the web visit >>> >> > https://groups.google.com/d/msg/google-appengine/-/r0SVTq6i4iEJ. >>> >> > >>> >> > To post to this group, send email to >>> [email protected]. >>> >> > To unsubscribe from this group, send email to >>> >> > [email protected]. >>> >> > For more options, visit this group at >>> >> > http://groups.google.com/group/google-appengine?hl=en. >>> >> >>> >> -- >>> >> You received this message because you are subscribed to the Google >>> Groups >>> >> "Google App Engine" group. >>> >> To post to this group, send email to >>> [email protected]. >>> >> To unsubscribe from this group, send email to >>> >> [email protected]. >>> >> For more options, visit this group at >>> >> http://groups.google.com/group/google-appengine?hl=en. >>> >> >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups >>> > "Google App Engine" group. >>> > To post to this group, send email to [email protected] >>> . >>> > To unsubscribe from this group, send email to >>> > [email protected]. >>> > For more options, visit this group at >>> > http://groups.google.com/group/google-appengine?hl=en. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >>> >>> >> > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
