On Mon, Nov 5, 2012 at 3:16 PM, Stefan Sperling <s...@elego.de> wrote:
> On Mon, Nov 05, 2012 at 02:54:07PM +0100, Stefan Fuhrmann wrote: > > On Sun, Nov 4, 2012 at 10:40 AM, Stefan Sperling <s...@elego.de> wrote: > > > I just came across something that reminded me of this thread. > > > It seems PostgreSQL is doing something quite similar to what we > > > want to do here: > > > > > > When the first PostgreSQL process attaches to the shared memory > segment, > > > it > > > checks how many processes are attached. If the result is anything > other > > > than > > > "one", it knows that there's another copy of PostgreSQL running which > is > > > pointed at the same data directory, and it bails out. > > > http://rhaas.blogspot.nl/2012/06/absurd-shared-memory-limits.html > > > > > > > IIUIC, the problems they are trying to solve are: > > > > * have only one process open / manage a given data base > > * have SHM of arbitrary size > > > > Currently, we use named SHM to make the value of > > two 64 bit numbers per repo visible to all processes. > > Also, we don't have a master process that would > > channel access to a given repository. > > > > The "corruption" issue is only about how to behave > > if someone wrote random data to one of our repo > > files. That's being addressed now (don't crash, have > > a predictable behavior in most cases). > > > > > If this works for postgres I wonder why it wouldn't work for us. > > > Is this something we cannot do because APR doesn't provide the > > > necessary abstractions? > > > > > > > The postgres code / approach may be helpful when > > we try to move the whole membuffer cache into a > > SHM segment. > > Ah, I see. > > Next question: Why don't we use a single SHM segment for the revprop cache? > > Revprop values are usually small so mapping a small amount of memory > would suffice. And using a single SHM segment would make updated values > immediately visible in all processes, wouldn't it? And we wouldn't need the > generation number dance to make sure all processes see up-to-date values. > Whichever process updates a revprop value would update the corresponding > section of shared memory. > First of all, I want to point out that we now have a working implementation for 1.8 and what we are discussing here is probably targeted at future releases. If we want revprop-only caches (to keep things simple), we still need to handle the following basic trade-off: Lifetime (effectiveness) ./. size. To be effective with e.g. serf, the cache content should survive single requests i.e. live longer than an fs_t. We also need several MB (~200B/rev) per repo for decent hit rates. OTOH, there may be hundreds of repositories on a server and it is very hard to re-size the revprop cache when the number of revs in a repo grows. It is thus not quite feasible to keep fairly-sized per-repository caches around indefinitely - even if they only contain revprops. That means that we should have one (or some small number) shared cache for all repositories and let e.g. some external process manage its lifetime etc. But that is technically no different from having our membuffer cache use shared memory instead of being process local - which is a good thing. The downside is that we need to address the following 3 issues when moving membuffer to SHM. From the easiest to the hardest: * make generations an integral feature of the cache (e.g. by tagging index entries and bumping the values upon "replace") This is necessary to get rid of the revprop generations. Race between revprop readers 1 and writer 2: 1: lookup revprop in cache -> miss 1: read revprop from disk -> "old content" 2: store new revprop on disk and in cache 1: store "old content" in cache Note that after the 3rd step, the new content may or may not be cached, i.e. we can't check for it in step 4. * Have some SHM not bound to a repository or a parent / child (fork) process relationship. Make it work on most platforms. * Portable, robust (lock owners may die), very low overhead (~1musec) many-readers-one-writer locks on the cache content. I have some ideas on how to do that but this will be very hard to do correctly. I'd like to see all that solved and SHM being used for membuffer - which has been designed with that goal in mind. It's the robustness part that makes it so much harder to do than I thought back then. -- Stefan^2. -- Certified & Supported Apache Subversion Downloads: * http://www.wandisco.com/subversion/download *