Hi -

My work on a multi-client libpager has progressed to the point where I have
pseudo code for the logic and am considering how to implement the pagemap.

I'm trying to achieve three goals with the pagemap:

1. keep pagemap entries as small as possible
2. support arbitrary numbers of clients, and
3. support a single client with minimal overhead

(1) seems important since ext2fs disk pagers cover the entire partition and
can grow quite large.  (2) is pretty much the whole point of my project;
though I have considering relaxing that requirement to set an upper limit
on the number of clients.  (3) seems important since it's the most common
case, but maybe the speed of libpager isn't much of an issue.

I don't see any reasonable way of achieving (2) without making the pagemap
entry at least as big as a pointer (4 bytes).  Currently it's a short (2
bytes), so we'll have to double the size of our pagemaps.

I'm considering several schemes to achieve (3).  Here's one, that handles
up to four clients fairly efficiently.

First, steal a bit from the pointer by aligning the structure on a word
boundary.  Then, if the least significant bit is 1, interpret the pagemap
entry as a bitfield rather than as a pointer to a more complicated
structure.

A 32 bit field (err... make that a 31 bit field) can handle up to four
clients.  The new pagemap entries have an ACCESSLIST, which is an unsorted
set of clients (those who currently have access), and a WAITLIST, which is
a sorted list of clients that have requested access, and a few boolean
flags.

We number our clients 0 through 3, and I'm thinking above moving their
ports into high port number space to easily identify them.  So, the first
client on the first memory object would be on port 0x8000000, the second
client would be on 0x80000001, etc.  The second memory object would have
its ports on 0x80000004 through 0x80000007.  So we can just strip off the
lowest 2 bits to figure out our "client number".  I don't think protected
payloads would help, since we're not receiving messages on these ports;
they're send ports used to communicate with the kernel(s).

ACCESSLIST entries requires two bits - one to indicate if access is
granted, and one to indicate if it's read-only or read-write access.  So,
we use 2*4 = 8 bits for the ACCESSLIST.  The WAITLIST is sorted (client
requests are processed FIFO) and requires four bits per slot - one to
indicate that the slot is in use, two to identify the client, one to
indicate if the request is for read or write access.  We need four slots
for four clients, so 4*4 = 16 bits for the WAITLIST.  Then we need a few
more flag bits.  It all fits.

Once we hit our fifth client, or some oddball condition (like an error
return other than the expected EIO, ENOSPC, EDQUOT), we shift to a pointer
that points to a more complicated, dynamically allocated structure.

The dynamically allocated structures are themselves stored in an rb-tree,
so that if multiple pages are in the same condition (same set of clients
with access, for example), they all point to the same structure.  Every
time we want to change a pagemap structure, we construct the new pagemap
structure in a temporary object, then search the rb-tree for a matching
entry, and either use a pointer to the existing entry (if found), or insert
the temporary object, use a pointer to it, and construct a new temporary
the next time we need to do this.

That can be a bit slow, so I'm thinking about including some extra pointers
in that dynamic structure to cache the most common cases (like moving the
first client on WAITLIST to ACCESSLIST).  Of course, these structures would
only be used if we've got more than four clients, so we're already into a
corner case anyway.  Right now, I'm leaning towards not including any extra
pointers right now.  I have no experience with large numbers of clients, so
I'm not really sure what should be optimized, anyway.

It's a pretty complex scheme, and thus prone to bugs, which is why I'm
posting to the list for comments.  Does anybody think it would be better to
ditch the bit field trick completely and handle even single clients using
pointers to malloc'ed structures?  The code would be simpler but slower.

    agape
    brent

Reply via email to