(Sorry, need to disable Ctrl-Return, which quite often sends mails
earlier than I really want.. continuing my mail)
On 08/27/2010 10:46 PM, Robert Haas wrote:
Yeah, probably. I think designing something that works efficiently
over a network is a somewhat different problem than designing
something that works on an individual node, and we probably shouldn't
let the designs influence each other too much.
Agreed. Thus I've left out any kind of congestion avoidance stuff from
imessages so far.
There's no padding or sophisticated allocation needed. You
just need a pointer to the last byte read (P1), the last byte allowed
to be read (P2), and the last byte allocated (P3). Writers take a
spinlock, advance P3, release the spinlock, write the message, take
the spinlock, advance P2, release the spinlock, and signal the reader.
That would block parallel writers (i.e. only one process can write to the
queue at any time).
I feel like there's probably some variant of this idea that works
around that problem. The problem is that when a worker finishes
writing a message, he needs to know whether to advance P2 only over
his own message or also over some subsequent message that has been
fully written in the meantime. I don't know exactly how to solve that
problem off the top of my head, but it seems like it might be
possible.
I've tried pretty much that before. And failed. Because the
allocation-order (i.e. the time the message gets created in preparation
for writing to it) isn't necessarily the same as the sending-order (i.e.
when the process has finished writing and decides to send the message).
To satisfy the FIFO property WRT the sending order, you need to decouple
allocation form the ordering (i.e. queuing logic).
(And yes, it has taken me a while to figure out what's wrong in
Postgres-R, before I've even noticed about that design bug).
Readers take the spinlock, read P1 and P2, release the spinlock, read
the data, take the spinlock, advance P1, and release the spinlock.
It would require copying data in case a process only needs to forward the
message. That's a quick pointer dequeue and enqueue exercise ATM.
If we need to do that, that's a compelling argument for having a
single messaging area rather than one per backend.
Absolutely, yes.
But I'm not sure I
see why we would need that sort of capability. Why wouldn't you just
arrange for the sender to deliver the message directly to the final
recipient?
A process can read and even change the data of the message before
forwarding it. Something the coordinator in Postgres-R does sometimes.
(As it is the interface to the GCS and thus to the rest of the nodes in
the cluster).
For parallel querying (on a single node) that's probably less important
a feature.
So, they know in advance how large the message will be but not what
the contents will be? What are they doing?
Filling the message until it's (mostly) full and then continue with the
next one. At least that's how the streaming approach on top of imessages
works.
But yes, it's somewhat annoying to have to know the message size in
advance. I didn't implement realloc so far. Nor can I think of any other
solution. Note that separation of allocation and queue ordering is
required anyway for the above reasons.
Well, the fact that something is commonly used doesn't mean it's right
for us. Tabula raza, we might design the whole system differently,
but changing it now is not to be undertaken lightly. Hopefully the
above comments shed some light on my concerns. In short, (1) I don't
want to preallocate a big chunk of memory we might not use,
Isn't that's exactly what we do now for lots of sub-systems, and what
I'd like to improve (i.e. reduce to a single big chunk).
(2) I fear
reducing the overall robustness of the system, and
Well, that applies to pretty much every new feature you add.
(3) I'm uncertain
what other systems would be able leverage a dynamic allocator of the
sort you propose.
Okay, that's up to me to show evidences (or at least a PoC).
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers