[jira] [Created] (NIFI-3156) PublishKafka performance without demarcator should be comparable to without

Joseph Witt (JIRA) Tue, 06 Dec 2016 07:02:23 -0800

Joseph Witt created NIFI-3156:
---------------------------------

             Summary: PublishKafka performance without demarcator should be 
comparable to without
                 Key: NIFI-3156
                 URL: https://issues.apache.org/jira/browse/NIFI-3156
             Project: Apache NiFi
          Issue Type: Improvement
            Reporter: Joseph Witt



The PublishKafka processor supports specification of a demarcator property 
which allows it to scan through the incoming input stream to demarcate messages 
that it writes to Kafka.  When using this performance is quite reasonable and 
fast and it makes sense since all items in that bundle are sent as a single 
interaction with Kafka and the appropriate ack is received.

However, when using that same processor without the demarcator performance is 
slower and it makes sense because again the bundle is sent as a single 
interaction with Kafka but in that case it is a single event.

To work around this today one can simply place MergeContent before PublishKafka 
to bundle some precise amount of data together.  With MergeContent they can 
specific max number of items to combine together, maximum amount of time to 
wait before doing so.

We should consider adding support for specifying maximum number of objects to 
send together in a single interaction with Kafka and thus avoid the need for 
demarcation/MergeContent preceding this processor.

We need both because in one case we could truly receive bundles of events from 
an external system and would not want to waste time/resources splitting the 
data when we could just logically split while sending to Kafka.  And this new 
property would let the user choose how many, at most, to send at once.

The tradeoff here is the more things you have in a single bundle the more 
difficult it is or more likely it is that duplicates would be possible.  The 
interface is favorable to ensuring zero loss but is susceptible to duplication 
in the presence of failure.  "At-least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (NIFI-3156) PublishKafka performance without demarcator should be comparable to without

Reply via email to