On Wed, 19 Apr 2023 17:12:09 -0400 Federico Grau wrote: > Copying sf reply to Debian bug #1033632 , as requested by pabs, to enable > Debian members to analyze. > > On Tue, Apr 18, 2023 at 08:35:03AM -0600, SourceForge.net Support & Ops wrote: ... > > We've checked our logs for the past week and see 209.87.16.61 with > > user-agent "Python-httplib2/$Rev$" has hit a couple of RSS feeds, but has > > not received any 429 status from our rate limits.
This is Planet Debian, I guess some blogs are on SourceForge. > > There is an IPv6 address 2607:f8f0:614:1::1274:73 (which is also > > qa.debian.org it seems) that is sending a lot of traffic. Nearly all of it > > is with a user-agent of "Mozilla/5.0 (X11; U; Linux i386; en-us) > > AppleWebKit/531.2+ (KHTML, like Gecko)", and hitting non-RSS feeds, it is > > hitting /projects/dispcalgui/files/... URLs over and over. This is caused by fakeupstream.cgi, which also has a SourceForge redirector, which recursively scrapes SourceForge files pages instead of using the RSS feed. It likely dates from before the RSS feed. There are only 3 packages using it, but none of them are dispcalgui. https://codesearch.debian.net/search?q=fakeupstream.cgi?upstream=sf/&literal=1 I temporarily disabled the web server IP address privacy in order to find out where the requests are coming from and found Msnbot IP addresses. Then I noticed the User-Agent is bingbot/2.0. I also verified that the IP addresses are legitimate bingbot addresses. https://en.wikipedia.org/wiki/Msnbot http://www.bing.com/bingbot.htm https://www.bing.com/webmasters/help/verify-bingbot-2195837f For now I have blocked bingbot from accessing fakeupstream.cgi and then requested that it stop accessing fakeupstream.cgi: https://salsa.debian.org/qa/qa/commit/37ada830d0c2c1ece51e7622910014b8ec047909 https://salsa.debian.org/qa/qa/commit/4893d7fce8537d6978ace6484889d3e5efe34af5 This has stopped the flood to SourceForge and hopefully will stop the flood to fakeupstream.cgi, so this bug can likely be closed now, but... There are some improvements that we could make to QA services: * pass on HTTP error codes from services fakeupstream.cgi accesses * switch fakeupstream.cgi SourceForge support to using the RSS feed * switch fakeupstream.cgi/sf.php User-Agents to legitimate ones If anyone would like to work on these, please submit a merge requests. If no-one does these fixes, then I may get to them eventually. > > A different pattern from that address does hit RSS feeds and has no > > user agent. That is likely to be the regular SourceForge redirector. -- bye, pabs https://wiki.debian.org/PaulWise
signature.asc
Description: This is a digitally signed message part