Hi,
I am designing a patch on top the 0.8 code base.
The patch would provide persistence on the producer side. Meaning that
messages being passed to the producer are persisted rather than kept
transiently in memory. So that if the broker/s cannot be reached,
messages can accumulate and will be sent through to the broker/s, when
they are available again. Although this would be somewhat superfluous in
the new replication paradigm of 0.8, it's still possible to have some
failures that disconnect a producer from the entire set of brokers. In
that case, this patch-under-design would prevent data loss. Making the
pipeline even more secured, and relieving producers of the need to
handle persistence on their own. Plan is to use the Kafka Logger
component for that. Of course having this behavior completely optional
through a configuration option.
The slightly-deeper level design details are to use a Kafka Log per
topic & partition (otherwise given the existing 0.8 code, in and around
/producer.async.DefaultEventHandler.dispatchSerializedData/, it would
seem resource intensive to keep track of messages sent v.s. failed ones
for managing resending). Using the logger, behavior for keeping replica
sets in sync would be skipped through choice of parameters, or would be
made parameterized to fully neutralize it regarding the producer's own
logging.
Now of course, given that 0.8 seems to be far along its runway, I assume
this should go on top the trunk, which I'd like to confirm with you is
where post 0.8 lives.
I'd appreciate your comments...
Thanks,
Matan