[ 
https://issues.apache.org/jira/browse/NIFI-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-14837:
----------------------------------
    Description: 
I'm using the GitHub Registry Client in my NiFi instance. I have about 50 
process groups that are versioned. Every process group matches a versioned flow 
that may have tens of commits.

When I want to change version, the current implementation will list all 
commits, and {*}for each commit{*}, will make an API call to GiHub in order to 
retrieve some specific informations (commit message, commit date, etc).

This is extremely ineffective and changing the version a flow ends up taking a 
very long time. For some cases with many commits, I cannot change version 
because the call in the NiFi UI would time out before the backend has sent back 
the full list of commits with all of the information.

This becomes very not friendly and barely usable. This will also impact the API 
rate limits a lot.

This Jira is to introduce multiple improvements that are making all of this 
MUCH better.
 * The GitHub client being used is initialized with an optional OkHttp client 
cache (see [https://hub4j.github.io/github-api/)]

 
{quote}This library comes with a pluggable connector to use different HTTP 
client implementations through {{{}HttpConnector{}}}. In particular, this means 
you can use [OkHttp|https://square.github.io/okhttp/], so we can make use of 
its HTTP response cache. Making a conditional request against the GitHub API 
and receiving a 304 response [does not count against the rate 
limit|https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#conditional-requests].
{quote}
 
 * Adding a LRU cache to the client with a fixed size of 1000 commits maximum 
in order to keep an internal cache of commit SHA to commit details.
 * Expose a property to limit the number of commits retrieved. The client does 
not ensure a chronological order but guarantees a topological ordering. So it 
should be chronological except in some specific edge cases like rebase, merge 
commits, cherry-pick, commits with manual dates, etc. However, this is very 
unlikely to happen with a normal usage of the client. Regardless the default is 
to retrieve all commits like it is right now.
 * Adding a Rate Abuse Limit Handler to log an error when abusing the API 
limits. 

  was:
I'm using the GitHub Registry Client in my NiFi instance. I have about 50 
process groups that are versioned. Every process group matches a versioned flow 
that may have tens of commits.

When I want to change version, the current implementation will list all 
commits, and {*}for each commit{*}, will make an API call to GiHub in order to 
retrieve some specific informations (commit message, commit date, etc).

This is extremely ineffective and changing the version a flow ends up taking a 
very long time. For some cases with many commits, I cannot change version 
because the call in the NiFi UI would time out before the backend has sent back 
the full list of commits with all of the information.

This becomes very not friendly and barely usable. This will also impact the API 
rate limits a lot.

This Jira is to introduce multiple improvements that are making all of this 
MUCH better.
 * The GitHub client being used is initialized with an optional OkHttp client 
cache (see [https://hub4j.github.io/github-api/)]

{quote}This library comes with a pluggable connector to use different HTTP 
client implementations through {{{}HttpConnector{}}}. In particular, this means 
you can use [OkHttp|https://square.github.io/okhttp/], so we can make use of 
its HTTP response cache. Making a conditional request against the GitHub API 
and receiving a 304 response [does not count against the rate 
limit|https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#conditional-requests].
{quote} * Adding a LRU cache to the client with a fixed size of 1000 commits 
maximum in order to keep an internal cache of commit SHA to commit details.
 * Expose a property to limit the number of commits retrieved. The client does 
not ensure a chronological order but guarantees a topological ordering. So it 
should be chronological except in some specific edge cases like rebase, merge 
commits, cherry-pick, commits with manual dates, etc. However, this is very 
unlikely to happen with a normal usage of the client. Regardless the default is 
to retrieve all commits like it is right now.
 * Adding a Rate Abuse Limit Handler to log an error when abusing the API 
limits. 


> Performance improvement GitHub Registry Client
> ----------------------------------------------
>
>                 Key: NIFI-14837
>                 URL: https://issues.apache.org/jira/browse/NIFI-14837
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>            Priority: Major
>
> I'm using the GitHub Registry Client in my NiFi instance. I have about 50 
> process groups that are versioned. Every process group matches a versioned 
> flow that may have tens of commits.
> When I want to change version, the current implementation will list all 
> commits, and {*}for each commit{*}, will make an API call to GiHub in order 
> to retrieve some specific informations (commit message, commit date, etc).
> This is extremely ineffective and changing the version a flow ends up taking 
> a very long time. For some cases with many commits, I cannot change version 
> because the call in the NiFi UI would time out before the backend has sent 
> back the full list of commits with all of the information.
> This becomes very not friendly and barely usable. This will also impact the 
> API rate limits a lot.
> This Jira is to introduce multiple improvements that are making all of this 
> MUCH better.
>  * The GitHub client being used is initialized with an optional OkHttp client 
> cache (see [https://hub4j.github.io/github-api/)]
>  
> {quote}This library comes with a pluggable connector to use different HTTP 
> client implementations through {{{}HttpConnector{}}}. In particular, this 
> means you can use [OkHttp|https://square.github.io/okhttp/], so we can make 
> use of its HTTP response cache. Making a conditional request against the 
> GitHub API and receiving a 304 response [does not count against the rate 
> limit|https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#conditional-requests].
> {quote}
>  
>  * Adding a LRU cache to the client with a fixed size of 1000 commits maximum 
> in order to keep an internal cache of commit SHA to commit details.
>  * Expose a property to limit the number of commits retrieved. The client 
> does not ensure a chronological order but guarantees a topological ordering. 
> So it should be chronological except in some specific edge cases like rebase, 
> merge commits, cherry-pick, commits with manual dates, etc. However, this is 
> very unlikely to happen with a normal usage of the client. Regardless the 
> default is to retrieve all commits like it is right now.
>  * Adding a Rate Abuse Limit Handler to log an error when abusing the API 
> limits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to