Joseph Witt created NIFI-3156:
---------------------------------
Summary: PublishKafka performance without demarcator should be
comparable to without
Key: NIFI-3156
URL: https://issues.apache.org/jira/browse/NIFI-3156
Project: Apache NiFi
Issue Type: Improvement
Reporter: Joseph Witt
The PublishKafka processor supports specification of a demarcator property
which allows it to scan through the incoming input stream to demarcate messages
that it writes to Kafka. When using this performance is quite reasonable and
fast and it makes sense since all items in that bundle are sent as a single
interaction with Kafka and the appropriate ack is received.
However, when using that same processor without the demarcator performance is
slower and it makes sense because again the bundle is sent as a single
interaction with Kafka but in that case it is a single event.
To work around this today one can simply place MergeContent before PublishKafka
to bundle some precise amount of data together. With MergeContent they can
specific max number of items to combine together, maximum amount of time to
wait before doing so.
We should consider adding support for specifying maximum number of objects to
send together in a single interaction with Kafka and thus avoid the need for
demarcation/MergeContent preceding this processor.
We need both because in one case we could truly receive bundles of events from
an external system and would not want to waste time/resources splitting the
data when we could just logically split while sending to Kafka. And this new
property would let the user choose how many, at most, to send at once.
The tradeoff here is the more things you have in a single bundle the more
difficult it is or more likely it is that duplicates would be possible. The
interface is favorable to ensuring zero loss but is susceptible to duplication
in the presence of failure. "At-least once".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)