[
https://issues.apache.org/jira/browse/NIFI-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre Villard updated NIFI-14837:
----------------------------------
Status: Patch Available (was: Open)
> Performance improvement GitHub Registry Client
> ----------------------------------------------
>
> Key: NIFI-14837
> URL: https://issues.apache.org/jira/browse/NIFI-14837
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Pierre Villard
> Assignee: Pierre Villard
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I'm using the GitHub Registry Client in my NiFi instance. I have about 50
> process groups that are versioned. Every process group matches a versioned
> flow that may have tens of commits.
> When I want to change version, the current implementation will list all
> commits, and {*}for each commit{*}, will make an API call to GiHub in order
> to retrieve some specific informations (commit message, commit date, etc).
> This is extremely ineffective and changing the version a flow ends up taking
> a very long time. For some cases with many commits, I cannot change version
> because the call in the NiFi UI would time out before the backend has sent
> back the full list of commits with all of the information.
> This becomes very not friendly and barely usable. This will also impact the
> API rate limits a lot.
> This Jira is to introduce multiple improvements that are making all of this
> MUCH better.
> * The GitHub client being used is initialized with an optional OkHttp client
> cache (see [https://hub4j.github.io/github-api/)]
>
> {quote}This library comes with a pluggable connector to use different HTTP
> client implementations through {{{}HttpConnector{}}}. In particular, this
> means you can use [OkHttp|https://square.github.io/okhttp/], so we can make
> use of its HTTP response cache. Making a conditional request against the
> GitHub API and receiving a 304 response [does not count against the rate
> limit|https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#conditional-requests].
> {quote}
>
> * Adding a LRU cache to the client with a fixed size of 1000 commits maximum
> in order to keep an internal cache of commit SHA to commit details.
> * Expose a property to limit the number of commits retrieved. The client
> does not ensure a chronological order but guarantees a topological ordering.
> So it should be chronological except in some specific edge cases like rebase,
> merge commits, cherry-pick, commits with manual dates, etc. However, this is
> very unlikely to happen with a normal usage of the client. Regardless the
> default is to retrieve all commits like it is right now.
> * Adding a Rate Abuse Limit Handler to log an error when abusing the API
> limits.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)