Re: github quota limit when scanning with the addition of tags

Stephen Connolly Thu, 04 Jan 2018 09:01:10 -0800

On 4 January 2018 at 16:43, <j.knu...@travelaudience.com> wrote:

> Ok, than I think I misunderstand what the scan is doing.
> During the scan, Jenkins creates a list in memory of all branches, tags,
> PRs. It does it from a single api call? Or from an api call for each type?
>


At least one API call for each requested (or implied requested) type.

e.g. If there are more than 100 branches then it will take more than one
request to get all branches as the page size is 100
e.g. If you request to build branches that are not also filed as pull
requests, then that implies we need the list of pull requests (even if you
didn't select Discover Pull Requests)


> And then while iterating over that list, for each entity Jenkins makes an
> api call to get the Jenkinsfile (or to find out it doesn't exist)?
>

Correct.


>
> If that's the case, it doesn't sound like there is much to be done in the
> current setup.
>

A quick win might be to maintain a secondary state file that tracks the
hash of the XML config for the SCMSourceCriteri and the hash of the XML of
each revision for each discovered "head". If the hashes are the same, then
we can assume no need to recheck.


>
> This is a problem though, because as more and more tags come, there is no
> logical way to keep adding them to the filter if Jenkins is the only source
> of truth on if the tag has already been built. As in, those few tags that
> don't reference a commit with a Jenkinsfile could just be deleted from
> github, but it doesn't fix the problem, just delays it a couple weeks.
>
>
> On Thursday, 4 January 2018 15:27:37 UTC+1, Stephen Connolly wrote:
>>
>> If you know those tags will never match, you could add a filter to
>> exclude them from discovery.
>>
>> Part of the issue here is that Multibranch doesn't know if the
>> SCMCriteria has changed from the last time it saw that revision (because
>> Jenkins config is a filesystem, who knows what was restored, edited with
>> vi, etc)... on top of that, this is a tag that was not discovered, so it
>> doesn't actually have a place to store the revision.
>>
>> Consequently, it will check for the Jenkinsfile every time you do a full
>> scan.
>>
>> On 4 January 2018 at 14:13, <j.kn...@travelaudience.com> wrote:
>>
>>> I do want tags. I want tags very much. I'm very happy this feature is
>>> finally available.
>>> There just happens to be some tags in that repo that reference commits
>>> in which no Jenkinsfile exists, and I happened to copy those examples.
>>>
>>> Here is a better example:
>>>
>>>   Checking tag v1.1.0 
>>> <https://github.com/travelaudience/cmt-cmtf/tree/v1.1.0>
>>>       ‘Jenkinsfile’ found
>>>     Met criteria
>>> No changes detected: v1.1.0 (still at 
>>> d11d5c94130db1b43dea147091c2cfc2d260b2c1)
>>> 19:05:09 GitHub API Usage: Current quota has 677 remaining (0 under 
>>> budget). Next quota of 5000 in 6 min 50 sec
>>>
>>>     Checking tag v1.1.1 
>>> <https://github.com/travelaudience/cmt-cmtf/tree/v1.1.1>
>>>       ‘Jenkinsfile’ found
>>>     Met criteria
>>> No changes detected: v1.1.1 (still at 
>>> 20b7a9ccd47f9e10165268ccc252bc4b793a61fc)
>>> 19:05:09 GitHub API Usage: Current quota has 675 remaining (2 over budget). 
>>> Next quota of 5000 in 6 min 50 sec. Sleeping for 26 sec.
>>> 19:05:36 GitHub API Usage: Current quota has 675 remaining (26 under 
>>> budget). Next quota of 5000 in 6 min 23 sec
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, 4 January 2018 15:02:11 UTC+1, Stephen Connolly wrote:
>>>>
>>>>
>>>>
>>>> On 4 January 2018 at 13:27, <j.kn...@travelaudience.com> wrote:
>>>>
>>>>> @Stephen
>>>>> You mention that caching the responses would "save about 50% of the
>>>>> requests." That seems like a significant savings to me.
>>>>>
>>>>> I'm also wondering, I'm seeing a lot of things like this in the scan
>>>>> log:
>>>>>
>>>>> Checking tag v0.28.1 
>>>>> <https://github.com/travelaudience/cmt-cmtf/tree/v0.28.1>
>>>>>       ‘Jenkinsfile’ not found
>>>>>     Does not meet criteria
>>>>> 19:01:38 GitHub API Usage: Current quota has 901 remaining (4 under 
>>>>> budget). Next quota of 5000 in 10 min
>>>>>
>>>>>     Checking tag v0.28.2 
>>>>> <https://github.com/travelaudience/cmt-cmtf/tree/v0.28.2>
>>>>>       ‘Jenkinsfile’ not found
>>>>>     Does not meet criteria
>>>>> 19:01:38 GitHub API Usage: Current quota has 897 remaining (0 under 
>>>>> budget). Next quota of 5000 in 10 min
>>>>>
>>>>>     Checking tag v0.28.3 
>>>>> <https://github.com/travelaudience/cmt-cmtf/tree/v0.28.3>
>>>>>       ‘Jenkinsfile’ not found
>>>>>     Does not meet criteria
>>>>> 19:01:38 GitHub API Usage: Current quota has 894 remaining (3 over 
>>>>> budget). Next quota of 5000 in 10 min. Sleeping for 27 sec.
>>>>> 19:02:06 GitHub API Usage: Current quota has 894 remaining (26 under 
>>>>> budget). Next quota of 5000 in 9 min 53 sec
>>>>>
>>>>>
>>>>>
>>>>> That seems to me like each tag invokes an api request? And with 500+
>>>>> tags, that seems like a lot of unneeded calls (most especially when 
>>>>> Jenkins
>>>>> doesn't even track/build the tag).
>>>>>
>>>>
>>>> Why are you discovering tags if you don't want tags?
>>>>
>>>> Every branch/tag/PR you discover needs at least one request to verify
>>>> that the marker file is present.
>>>>
>>>> If you don't want tags, don't discover them and you will save a lot of
>>>> requests.
>>>>
>>>>
>>>>> Or am I reading the logs incorrectly? If that is the case then a cache
>>>>> might save over 90% of the requests in this case.
>>>>> Should I create a Jira ticket for this?
>>>>>
>>>>>
>>>>> On Wednesday, 3 January 2018 16:52:03 UTC+1,
>>>>> j.kn...@travelaudience.com wrote:
>>>>>>
>>>>>> There are only two good reasons to scan periodically:
>>>>>>> 1. To recover from missed events (keep in mind that follow-up
>>>>>>> commits will typically recover anyway, so the only case here is a commit
>>>>>>> before bedtime not being built by morning because that event was not
>>>>>>> delivered by GitHub)
>>>>>>>
>>>>>> From my experience working with developers, that isn't the only use
>>>>>> case. The more common use case (when a missed event happens) is that they
>>>>>> pushed a commit and are waiting for it to proceed through the pipeline 
>>>>>> and
>>>>>> notify them. Fast notification is a key to good CI/CD. So while missed
>>>>>> events are not a frequent occurrence, waiting 7 days isn't an option, and
>>>>>> the only other solution for a developer is to have an in-depth knowledge 
>>>>>> of
>>>>>> Jenkins and know that this issue exists.
>>>>>>
>>>>>>
>>>>>>> 2. To run the orphaned item strategies (which is probably fine at
>>>>>>> once per week for most people)
>>>>>>
>>>>>> Totally agree, that's fine
>>>>>>
>>>>>>
>>>>>> As we already have a few repos with over 500 tags (and mind you these
>>>>>> are still new repos), I expect that this issue will impact others as they
>>>>>> begin to implement the ability to scan tags even with a 24 hour interval.
>>>>>>
>>>>>> ----
>>>>>>
>>>>>> Also, the recommendation in the UI for the interval setting is:
>>>>>> Subsequent commits should trigger indexing anyway and result in the
>>>>>> commit being picked up, so most people will pick between *4 hours
>>>>>> and 1 day*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wednesday, 3 January 2018 15:42:15 UTC+1, Stephen Connolly wrote:
>>>>>>>
>>>>>>> This is the limitation of 5000 requests per hour.
>>>>>>>
>>>>>>> Ideally we would look into caching the github responses so that
>>>>>>> duplicate requests could be eliminated... but my preliminary analysis 
>>>>>>> shows
>>>>>>> that would basically save about 50% of the requests.
>>>>>>>
>>>>>>> The recommendation for "*Scan Organization Triggers* -> *Periodically
>>>>>>> if not otherwise run*" is *at least 8 hours* more likely somewhere
>>>>>>> between 24h and 7 days *depending on how long you are willing to
>>>>>>> wait* for a failure to deliver an event from GitHub.
>>>>>>>
>>>>>>> There are only two good reasons to scan periodically:
>>>>>>>
>>>>>>> 1. To recover from missed events (keep in mind that follow-up
>>>>>>> commits will typically recover anyway, so the only case here is a commit
>>>>>>> before bedtime not being built by morning because that event was not
>>>>>>> delivered by GitHub)
>>>>>>> 2. To run the orphaned item strategies (which is probably fine at
>>>>>>> once per week for most people)
>>>>>>>
>>>>>>> The only other reason to scan periodically is a bad one, namely
>>>>>>>
>>>>>>> * You cannot set up push notification from GitHub
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3 January 2018 at 14:19, <j.kn...@travelaudience.com> wrote:
>>>>>>>
>>>>>>>> Now that we've added *Discover tags*[1
>>>>>>>> <https://issues.jenkins-ci.org/browse/JENKINS-34395>] and a *Build
>>>>>>>> everything*[2
>>>>>>>> <https://github.com/jenkinsci/github-branch-source-plugin/pull/158#issuecomment-332842623>]
>>>>>>>> strategy, we're running into Github quota limits quite frequently.
>>>>>>>>
>>>>>>>> 18:58:09 GitHub API Usage: Current quota has 1110 remaining (5 over 
>>>>>>>> budget). Next quota of 5000 in 13 min. Sleeping for 29 sec.
>>>>>>>>
>>>>>>>>
>>>>>>>> We've had to extend the *Scan Organization Triggers* -> *Periodically
>>>>>>>> if not otherwise run *setting to be 8 hours, to help limit the
>>>>>>>> amount of scans, but that hasn't completely solved  this issue, nor is 
>>>>>>>> it
>>>>>>>> the goal we want to achieve.
>>>>>>>>
>>>>>>>>
>>>>>>>> There's an open bug about the time setting and github quota limits  
>>>>>>>> (*JENKINS-47154*[3] 
>>>>>>>> <https://issues.jenkins-ci.org/browse/JENKINS-47154>), but it's not 
>>>>>>>> relevant in this case.
>>>>>>>>
>>>>>>>> So I'm wondering if it's a bug in the github-branch-source-plugin? or 
>>>>>>>> in the Build everything extension? or is there simply an easy way to 
>>>>>>>> request Jenkins to have a higher API quota from GitHub?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> REF:
>>>>>>>>
>>>>>>>> 1. https://issues.jenkins-ci.org/browse/JENKINS-34395
>>>>>>>>
>>>>>>>> 2. 
>>>>>>>> https://github.com/jenkinsci/github-branch-source-plugin/pull/158#issuecomment-332842623
>>>>>>>>
>>>>>>>> 3. https://issues.jenkins-ci.org/browse/JENKINS-47154
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Jenkins Users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to jenkinsci-use...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/jenkinsci-users/4079f366-
>>>>>>>> 003e-4ac4-8aea-462ef4ed2090%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/jenkinsci-users/4079f366-003e-4ac4-8aea-462ef4ed2090%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Jenkins Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to jenkinsci-use...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/jenkinsci-users/628123fe-
>>>>> ab0f-4139-b307-15b4c4470b66%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/jenkinsci-users/628123fe-ab0f-4139-b307-15b4c4470b66%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Jenkins Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to jenkinsci-use...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/jenkinsci-users/3bdd60ea-91b2-4e82-9b24-b4e4583f1d08%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/jenkinsci-users/3bdd60ea-91b2-4e82-9b24-b4e4583f1d08%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jenkinsci-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/jenkinsci-users/0acbea7c-c882-437d-abfe-859a7763996b%40googlegroups.
> com
> <https://groups.google.com/d/msgid/jenkinsci-users/0acbea7c-c882-437d-abfe-859a7763996b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/CA%2BnPnMzU2yN5x3DoaM4ag2aJjrTpbHyFocXk1i4hHHnHx-Odvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: github quota limit when scanning with the addition of tags

Reply via email to