> > if one node is just slow enough in responding that it > > falls outside the timeout, you could get an annoying situation > > where that node is out-of-step forever after. >
i fought some socket mgmt software for a few years that did timeouts and rollup like this. it seemed to me that between timeouts and retransmission one could not dig oneself out of the hole without a proper protocol. and doing it on top of tcp was impossible. > worse yet, nodes may be sending more than one line at a time, > circumventing the aggregator. if they do it fast enough it becomes a > real mess and there's no amount of lookback one can do to ensure this > isn't happening :) but even if it logs the message once, your disk full message will appear in n*2 seconds. do we need to take up a collection to get you some disks? - erik