[ 
https://issues.apache.org/jira/browse/NIFI-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-14837:
----------------------------------
    Status: Patch Available  (was: Open)

> Performance improvement GitHub Registry Client
> ----------------------------------------------
>
>                 Key: NIFI-14837
>                 URL: https://issues.apache.org/jira/browse/NIFI-14837
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm using the GitHub Registry Client in my NiFi instance. I have about 50 
> process groups that are versioned. Every process group matches a versioned 
> flow that may have tens of commits.
> When I want to change version, the current implementation will list all 
> commits, and {*}for each commit{*}, will make an API call to GiHub in order 
> to retrieve some specific informations (commit message, commit date, etc).
> This is extremely ineffective and changing the version a flow ends up taking 
> a very long time. For some cases with many commits, I cannot change version 
> because the call in the NiFi UI would time out before the backend has sent 
> back the full list of commits with all of the information.
> This becomes very not friendly and barely usable. This will also impact the 
> API rate limits a lot.
> This Jira is to introduce multiple improvements that are making all of this 
> MUCH better.
>  * The GitHub client being used is initialized with an optional OkHttp client 
> cache (see [https://hub4j.github.io/github-api/)]
>  
> {quote}This library comes with a pluggable connector to use different HTTP 
> client implementations through {{{}HttpConnector{}}}. In particular, this 
> means you can use [OkHttp|https://square.github.io/okhttp/], so we can make 
> use of its HTTP response cache. Making a conditional request against the 
> GitHub API and receiving a 304 response [does not count against the rate 
> limit|https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#conditional-requests].
> {quote}
>  
>  * Adding a LRU cache to the client with a fixed size of 1000 commits maximum 
> in order to keep an internal cache of commit SHA to commit details.
>  * Expose a property to limit the number of commits retrieved. The client 
> does not ensure a chronological order but guarantees a topological ordering. 
> So it should be chronological except in some specific edge cases like rebase, 
> merge commits, cherry-pick, commits with manual dates, etc. However, this is 
> very unlikely to happen with a normal usage of the client. Regardless the 
> default is to retrieve all commits like it is right now.
>  * Adding a Rate Abuse Limit Handler to log an error when abusing the API 
> limits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to