[
https://issues.apache.org/jira/browse/NIFI-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935394#comment-17935394
]
Vijaya Gorla commented on NIFI-14335:
-------------------------------------
I agree that simply adding {{SupportsBatching}} annotation does not make
Elastic calls any more efficient, but as you said it makes the NiFi processor
invocation by the framework efficient. We had a customised version of
{{GetElasticsearch}} processor in our codebase for the last couple of years
where the customsation is simply adding the annotation. We found that, when
duration is set to 2 secs. it increased throughput enormously specially when
when we are dealing with hundreds of millions of flow files per day.
For comparison, I just did a test in a three node NiFi cluster with the
processor set to a single concurrent task. Without {{{}SupportsBatching{}}},
throughput is 68,300 flow files per five minutes. With {{SupportsBatching}} and
run duration of 2 secs, throughput is 545,000 flow files per five minutes.
> Support NiFi framework batching in Elasticsearch processors
> -----------------------------------------------------------
>
> Key: NIFI-14335
> URL: https://issues.apache.org/jira/browse/NIFI-14335
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 2.2.0
> Reporter: Vijaya Gorla
> Priority: Minor
> Labels: elasticsearch
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Elasticsearch processors currently do not support NiFi framework batching
> (using {{SupportsBatching}} annotation). Although {{PutElasticsearchJson}}
> and {{PutElasticsearchRecord}} processors support batching but this is
> implemented in the processor, not using {{SupportsBatching}} annotation.
> Following processors could benefit from framework batching where high
> throughput is required.
> * {{GetElasticsearch}}
> * {{JsonQueryElasticsearch}}
> * {{UpdateByQueryElasticsearch}}
> * {{DeleteByQueryElasticsearch}}
> Adding {{SupportsBatching}} with {{DefaultRunDuration.NO_BATCHING}} would
> preserve the existing behaviour by default and enable batching if required.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)