"A Bigtable cluster stores a number of tables. Each ta-
ble consists of a set of tablets, and each tablet contains
all data associated with a row range. Initially, each table
consists of just one tablet. As a table grows, it is auto-
matically split into multiple tablets, each approximately
100-200 MB in size by default."

Also: 100-200 MB is the default configuration. It might not be what we have
configured. This is *really really* detailed, though.

--
Ikai Lan
Developer Programs Engineer, Google App Engine
plus.ikailan.com



On Thu, Feb 2, 2012 at 7:04 PM, Ikai Lan (Google) <[email protected]> wrote:

> Okay, looks like these whitepapers are at research.google.com now:
>
> http://research.google.com/archive/bigtable.html
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> plus.ikailan.com
>
>
>
> On Thu, Feb 2, 2012 at 7:03 PM, Ikai Lan (Google) <[email protected]>wrote:
>
>> Robert, I'll see what I can do. No promises on an ETA. It isn't in one of
>> the white papers?
>>
>> http://labs.google.com/papers/bigtable.html
>>
>> Oh what the heck ... the link is broken. Let me see what's up.
>>
>> --
>> Ikai Lan
>> Developer Programs Engineer, Google App Engine
>> plus.ikailan.com
>>
>>
>>
>> On Thu, Feb 2, 2012 at 1:56 PM, Robert Kluin <[email protected]>wrote:
>>
>>> Yeah Ikai is completely correct.  I should have noted more clearly
>>> that this is not something I even waste time worrying about until I
>>> think I'm actually hitting it, which is not often.  In the few cases
>>> where I do think I've bumped into it, it is a writing thousands of
>>> entities per second type of thing -- which is not very common.
>>>
>>> It is interesting that sharding is determined by access patterns.  Is
>>> that something you can elaborate on at all?  ;)
>>>
>>>
>>> Robert
>>>
>>>
>>>
>>> On Thu, Feb 2, 2012 at 16:14, Ikai Lan (Google) <[email protected]>
>>> wrote:
>>> > Thanks for the answers, Robert.
>>> >
>>> > Shard size isn't determined by amount of data, but by access patterns.
>>> An
>>> > example of an anti-pattern that will cause a shard size imbalance
>>> would be
>>> > an entity write every time a user takes an action - but you never do
>>> > anything with this data. Since the data just kind of accumulates, the
>>> shard
>>> > never splits (unless it hits some hardware bound, which I've never
>>> really
>>> > seen happen yet with GAE data).
>>> >
>>> > As a final note, it takes a LOT of writes before this sort of thing
>>> happens,
>>> > and I sometimes regret writing that blog post because anytime you
>>> write a
>>> > blog post about scalability patterns, it invites people to prematurely
>>> > implement them (Brett Slatkin's video generated an endless number of
>>> > questions from people doing sub 1 QPS). We've done launches on the
>>> > YouTube/Google homepage
>>> > (
>>> http://blog.golang.org/2011/12/from-zero-to-go-launching-on-google.html)
>>> > that haven't required us to make these changes because they did fine
>>> under
>>> > load testing. I'd invest more energy in figuring out the right way to
>>> load
>>> > test, then trying to figure out the bottlenecks when you hit limits
>>> with
>>> > real data.
>>> >
>>> > --
>>> > Ikai Lan
>>> > Developer Programs Engineer, Google App Engine
>>> > plus.ikailan.com
>>> >
>>> >
>>> >
>>> > On Wed, Feb 1, 2012 at 9:19 PM, Robert Kluin <[email protected]>
>>> wrote:
>>> >>
>>> >> So I'd say don't worry about it unless you actually hit this problem.
>>> >> If you do know you'll hit it, see if you have a way to "shard" the
>>> >> timestamp, by account, user, or region, etc..., to relieve some of the
>>> >> pressure.  If you must have a global timestamp, I'd say keep it as
>>> >> simple as possible, until you hit the issue.  At that point you can
>>> >> figure out a fix.
>>> >>
>>> >> When I have timestamps on high write-rate entities that are
>>> >> non-critical, for example "expiration" times that are used only for
>>> >> cleanup, I'll sometimes add a random jitter of several hours to spread
>>> >> the writes out a bit.  I'd be surprised if changing it by a few
>>> >> seconds helped much -- but it could.  Keep in mind, there will already
>>> >> be some degree of randomness since the instance clocks have some
>>> >> slight variation.  If you're hitting this issue, I'd give it a shot
>>> >> though.  If it works it could at least buy you some time to get a
>>> >> better fix.
>>> >>
>>> >> I don't think there is a fixed number of rows per shard.  I think it
>>> >> is split up by data size, and I don't think the exact number is
>>> >> publicly documented.  Maybe you can roughly figure it out via
>>> >> experimentation.
>>> >>
>>> >>
>>> >> Robert
>>> >>
>>> >>
>>> >> On Wed, Feb 1, 2012 at 02:28, WGuerlich <[email protected]> wrote:
>>> >> > I know, I'm going to hit the write limit with a timestamp I need to
>>> >> > update
>>> >> > on every write and which needs to be indexed.
>>> >> >
>>> >> > As an alternative to sharding: What do you think about adding time
>>> >> > jitter to
>>> >> > the timestamp, that is, changing time randomly by a couple seconds?
>>> In
>>> >> > my
>>> >> > application the timestamp being off by a couple senconds wouldn't
>>> pose a
>>> >> > problem.
>>> >> >
>>> >> > Now what I need to know is: How many index entries can I expect to
>>> go
>>> >> > into
>>> >> > one tablet? This is needed to estimate the amount of jitter
>>> necessary to
>>> >> > avoid hitting the same tablet on every write.
>>> >> >
>>> >> > Any insights on this?
>>> >> >
>>> >> > Wolfram
>>> >> >
>>> >> > --
>>> >> > You received this message because you are subscribed to the Google
>>> >> > Groups
>>> >> > "Google App Engine" group.
>>> >> > To view this discussion on the web visit
>>> >> > https://groups.google.com/d/msg/google-appengine/-/r0SVTq6i4iEJ.
>>> >> >
>>> >> > To post to this group, send email to
>>> [email protected].
>>> >> > To unsubscribe from this group, send email to
>>> >> > [email protected].
>>> >> > For more options, visit this group at
>>> >> > http://groups.google.com/group/google-appengine?hl=en.
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> Groups
>>> >> "Google App Engine" group.
>>> >> To post to this group, send email to
>>> [email protected].
>>> >> To unsubscribe from this group, send email to
>>> >> [email protected].
>>> >> For more options, visit this group at
>>> >> http://groups.google.com/group/google-appengine?hl=en.
>>> >>
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups
>>> > "Google App Engine" group.
>>> > To post to this group, send email to [email protected]
>>> .
>>> > To unsubscribe from this group, send email to
>>> > [email protected].
>>> > For more options, visit this group at
>>> > http://groups.google.com/group/google-appengine?hl=en.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to