Jasper Knulst created NIFI-4675:
-----------------------------------

             Summary: PublishKafka_0_10 can't use demarcator and kafka key at 
the same time
                 Key: NIFI-4675
                 URL: https://issues.apache.org/jira/browse/NIFI-4675
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.2.0
            Reporter: Jasper Knulst
             Fix For: 1.5.0


At the moment you can't split up a flowfile using a demarcator AND set the 
Kafka key (kafka.key) attribute for all resulting Kafka records at the same 
time. The code explicitly prevents this.

Still it would be a valuable performance booster to have the ability to use 
both at the same time in all cases where 1 flowfile contains many individual 
kafka records. Flowfiles would not have to be pre split (explosion of NiFi 
overhead) if you want to set the key. 

Note:
Using demarcator and kafka key at the same time will normally make every 
resulting kafka record from 1 incoming flowfile to have the same kafka key (see 
REMARK).

I know a live NiFi deployment where this fix/feature (provided as custom fix) 
led to a 500 - 600% increase in throughput. Others could and should benefit as 
well.

REMARK
The argument against this feature has been that it is not a good idea to 
intentionally generate many duplicate Kafka keys. I would argue that it is up 
to the user to decide. Most would use Kafka as a pure distributed log system 
and key uniqueness is not important. The kafka key can be really valuable 
grouping placeholder though. The only case where this would get problematic is 
on  compaction of Kafka topics when kafka keys are deduplicated. But after we 
put sufficient warnings and disclaimers for this risk in the tooltips it is up 
to the user to decide whether to use the performance booster.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to