If you look at the core of Barrelfish, you'll see that this is essentially what they are doing -- essentially using an extremely small microkernel (like L4) that's very efficient at various forms of message passing. That's the only thing that is duplicated on the various cores. The services themselves can be distributed and/or replicated as appropriate (although their approach favors replication) -- it all depends on the characteristics of the workload.
it sounds like the kernel (L4-like, supposedly tuned to the specific hardware) and the "monitor" (userland, portable) are shared, from the paper. Btw, they have the source code up for free (http://www.barrelfish.org/release_20090914.html) which I supposed could be used to more definitively answer these questions with some effort...
-eric
Tim Newsham http://www.thenewsh.com/~newsham/