[ 
https://issues.apache.org/jira/browse/NIFI-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935394#comment-17935394
 ] 

Vijaya Gorla commented on NIFI-14335:
-------------------------------------

I agree that simply adding {{SupportsBatching}} annotation does not make 
Elastic calls any more efficient, but as you said it makes the NiFi processor 
invocation by the framework efficient. We had a customised version of 
{{GetElasticsearch}} processor in our codebase for the last couple of years 
where the customsation is simply adding the annotation. We found that, when 
duration is set to 2 secs. it increased throughput enormously specially when 
when we are dealing with hundreds of millions of flow files per day.

For comparison, I just did a test in a three node NiFi cluster with the 
processor set to a single concurrent task. Without {{{}SupportsBatching{}}}, 
throughput is 68,300 flow files per five minutes. With {{SupportsBatching}} and 
run duration of 2 secs, throughput is 545,000 flow files per five minutes.

> Support NiFi framework batching in Elasticsearch processors
> -----------------------------------------------------------
>
>                 Key: NIFI-14335
>                 URL: https://issues.apache.org/jira/browse/NIFI-14335
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 2.2.0
>            Reporter: Vijaya Gorla
>            Priority: Minor
>              Labels: elasticsearch
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elasticsearch processors currently do not support NiFi framework batching 
> (using {{SupportsBatching}} annotation). Although {{PutElasticsearchJson}} 
> and {{PutElasticsearchRecord}} processors support batching but this is 
> implemented in the processor, not using {{SupportsBatching}} annotation.
> Following processors could benefit from framework batching where high 
> throughput is required.
>  * {{GetElasticsearch}}
>  * {{JsonQueryElasticsearch}}
>  * {{UpdateByQueryElasticsearch}}
>  * {{DeleteByQueryElasticsearch}}
> Adding {{SupportsBatching}} with {{DefaultRunDuration.NO_BATCHING}} would 
> preserve the existing behaviour by default and enable batching if required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to