Jasper Knulst created NIFI-4675:
-----------------------------------
Summary: PublishKafka_0_10 can't use demarcator and kafka key at
the same time
Key: NIFI-4675
URL: https://issues.apache.org/jira/browse/NIFI-4675
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Affects Versions: 1.2.0
Reporter: Jasper Knulst
Fix For: 1.5.0
At the moment you can't split up a flowfile using a demarcator AND set the
Kafka key (kafka.key) attribute for all resulting Kafka records at the same
time. The code explicitly prevents this.
Still it would be a valuable performance booster to have the ability to use
both at the same time in all cases where 1 flowfile contains many individual
kafka records. Flowfiles would not have to be pre split (explosion of NiFi
overhead) if you want to set the key.
Note:
Using demarcator and kafka key at the same time will normally make every
resulting kafka record from 1 incoming flowfile to have the same kafka key (see
REMARK).
I know a live NiFi deployment where this fix/feature (provided as custom fix)
led to a 500 - 600% increase in throughput. Others could and should benefit as
well.
REMARK
The argument against this feature has been that it is not a good idea to
intentionally generate many duplicate Kafka keys. I would argue that it is up
to the user to decide. Most would use Kafka as a pure distributed log system
and key uniqueness is not important. The kafka key can be really valuable
grouping placeholder though. The only case where this would get problematic is
on compaction of Kafka topics when kafka keys are deduplicated. But after we
put sufficient warnings and disclaimers for this risk in the tooltips it is up
to the user to decide whether to use the performance booster.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)