On Tue, 12 May 2020 17:41:09 +0200 Peter Kovacs <pe...@apache.org> wrote:
> Okay, I had a short debug session with Dave and Humbedooh. > > We are now sure that the crawlers are not blocked. The 301 Response > comes from the fact that Yandex still defaults to http and not https. This post on User Forum might be relevant https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756 Rory > > After I added https toi the URL all worked fine. > > Wave did also do a curl request which also worked fine. > > > We have agreed now that I play the ball back to google, with the > feedback that this looks like a Google internal issue. > > The Robot.txt has not been changed for 11 years. Yandex can crawl the > URL and we can curl the Webpage. So we think it is an Google Issue. > > > I very much appreciated the quick session. Thanks. > > > all the Best > > Peter > > Am 12.05.20 um 17:24 schrieb Dave Fisher: > > It’s not an IP Ban. Infra tells me that would not be a 301. > > > > Ah-ha - here is the 301: > > > > % curl -D headers http://forum.openoffice.org/ > > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > > <html><head> > > <title>301 Moved Permanently</title> > > </head><body> > > <h1>Moved Permanently</h1> > > <p>The document has moved <a > > href="https://forum.openoffice.org/">here</a>.</p> > > </body></html> > > > > Surprising that they cannot shift from HTTP to HTTPS via a 301! > > > > Regards, > > Dave > > > >> On May 12, 2020, at 8:04 AM, Dave Fisher <w...@apache.org> wrote: > >> > >> Information about Infra IP Bans is here: > >> https://infra.apache.org/infra-ban.html > >> > >> Please direct the Google engineer to that resource. > >> > >> Regards, > >> Dave > >> > >>> On May 12, 2020, at 7:55 AM, Dave Fisher <w...@apache.org> wrote: > >>> > >>> Are you sure you weren’t using forums.openoffice.org instead of > >>> forum.openoffice.org? > >>> > >>> curl -D headers https://forum.openoffice.org/ does return the correct > >>> page. > >>> > >>> The robots.txt is this: > >>> > >>> curl -D headers https://forum.openoffice.org/robots.txt > >>> User-agent: * > >>> Crawl-delay: 1 > >>> Disallow: /en/forum/common.php > >>> Disallow: /en/forum/config.php > >>> Disallow: /en/forum/con.php > >>> Disallow: /en/forum/faq.php > >>> Disallow: /en/forum/mcp.php > >>> Disallow: /en/forum/memberlist.php > >>> Disallow: /en/forum/posting.php > >>> Disallow: /en/forum/report.php > >>> Disallow: /en/forum/search.php > >>> Disallow: /en/forum/style.php > >>> Disallow: /en/forum/ucp.php > >>> Disallow: /en/forum/viewonline.php > >>> Disallow: /en/forum/adm > >>> Disallow: /en/forum/cache > >>> Disallow: /en/forum/docs > >>> Disallow: /en/forum/files > >>> Disallow: /en/forum/images > >>> Disallow: /en/forum/includes > >>> Disallow: /en/forum/language > >>> Disallow: /en/forum/store > >>> Disallow: /en/forum/styles > >>> Disallow: /es/forum/common.php > >>> Disallow: /es/forum/config.php > >>> Disallow: /es/forum/con.php > >>> Disallow: /es/forum/faq.php > >>> Disallow: /es/forum/mcp.php > >>> Disallow: /es/forum/memberlist.php > >>> Disallow: /es/forum/posting.php > >>> Disallow: /es/forum/report.php > >>> Disallow: /es/forum/search.php > >>> Disallow: /es/forum/style.php > >>> Disallow: /es/forum/ucp.php > >>> Disallow: /es/forum/viewonline.php > >>> Disallow: /es/forum/adm > >>> Disallow: /es/forum/cache > >>> Disallow: /es/forum/docs > >>> Disallow: /es/forum/files > >>> Disallow: /es/forum/images > >>> Disallow: /es/forum/includes > >>> Disallow: /es/forum/language > >>> Disallow: /es/forum/store > >>> Disallow: /es/forum/styles > >>> Disallow: /fr/forum/common.php > >>> Disallow: /fr/forum/config.php > >>> Disallow: /fr/forum/con.php > >>> Disallow: /fr/forum/faq.php > >>> Disallow: /fr/forum/mcp.php > >>> Disallow: /fr/forum/memberlist.php > >>> Disallow: /fr/forum/posting.php > >>> Disallow: /fr/forum/report.php > >>> Disallow: /fr/forum/search.php > >>> Disallow: /fr/forum/style.php > >>> Disallow: /fr/forum/ucp.php > >>> Disallow: /fr/forum/viewonline.php > >>> Disallow: /fr/forum/adm > >>> Disallow: /fr/forum/cache > >>> Disallow: /fr/forum/docs > >>> Disallow: /fr/forum/files > >>> Disallow: /fr/forum/images > >>> Disallow: /fr/forum/includes > >>> Disallow: /fr/forum/language > >>> Disallow: /fr/forum/store > >>> Disallow: /fr/forum/styles > >>> Disallow: /fr/ci-joint > >>> Disallow: /hu/forum/common.php > >>> Disallow: /hu/forum/config.php > >>> Disallow: /hu/forum/con.php > >>> Disallow: /hu/forum/faq.php > >>> Disallow: /hu/forum/mcp.php > >>> Disallow: /hu/forum/memberlist.php > >>> Disallow: /hu/forum/posting.php > >>> Disallow: /hu/forum/report.php > >>> Disallow: /hu/forum/search.php > >>> Disallow: /hu/forum/style.php > >>> Disallow: /hu/forum/ucp.php > >>> Disallow: /hu/forum/viewonline.php > >>> Disallow: /hu/forum/adm > >>> Disallow: /hu/forum/cache > >>> Disallow: /hu/forum/docs > >>> Disallow: /hu/forum/files > >>> Disallow: /hu/forum/images > >>> Disallow: /hu/forum/includes > >>> Disallow: /hu/forum/language > >>> Disallow: /hu/forum/store > >>> Disallow: /hu/forum/styles > >>> Disallow: /ja/forum/common.php > >>> Disallow: /ja/forum/config.php > >>> Disallow: /ja/forum/con.php > >>> Disallow: /ja/forum/faq.php > >>> Disallow: /ja/forum/mcp.php > >>> Disallow: /ja/forum/memberlist.php > >>> Disallow: /ja/forum/posting.php > >>> Disallow: /ja/forum/report.php > >>> Disallow: /ja/forum/search.php > >>> Disallow: /ja/forum/style.php > >>> Disallow: /ja/forum/ucp.php > >>> Disallow: /ja/forum/viewonline.php > >>> Disallow: /ja/forum/adm > >>> Disallow: /ja/forum/cache > >>> Disallow: /ja/forum/docs > >>> Disallow: /ja/forum/files > >>> Disallow: /ja/forum/images > >>> Disallow: /ja/forum/includes > >>> Disallow: /ja/forum/language > >>> Disallow: /ja/forum/store > >>> Disallow: /ja/forum/styles > >>> Disallow: /test > >>> Disallow: /nl/forum/common.php > >>> Disallow: /nl/forum/config.php > >>> Disallow: /nl/forum/con.php > >>> Disallow: /nl/forum/faq.php > >>> Disallow: /nl/forum/mcp.php > >>> Disallow: /nl/forum/memberlist.php > >>> Disallow: /nl/forum/posting.php > >>> Disallow: /nl/forum/report.php > >>> Disallow: /nl/forum/search.php > >>> Disallow: /nl/forum/style.php > >>> Disallow: /nl/forum/ucp.php > >>> Disallow: /nl/forum/viewonline.php > >>> Disallow: /nl/forum/adm > >>> Disallow: /nl/forum/cache > >>> Disallow: /nl/forum/docs > >>> Disallow: /nl/forum/files > >>> Disallow: /nl/forum/images > >>> Disallow: /nl/forum/includes > >>> Disallow: /nl/forum/language > >>> Disallow: /nl/forum/store > >>> Disallow: /nl/forum/styles > >>> Disallow: /vi/forum/common.php > >>> Disallow: /vi/forum/config.php > >>> Disallow: /vi/forum/con.php > >>> Disallow: /vi/forum/faq.php > >>> Disallow: /vi/forum/mcp.php > >>> Disallow: /vi/forum/memberlist.php > >>> Disallow: /vi/forum/posting.php > >>> Disallow: /vi/forum/report.php > >>> Disallow: /vi/forum/search.php > >>> Disallow: /vi/forum/style.php > >>> Disallow: /vi/forum/ucp.php > >>> Disallow: /vi/forum/viewonline.php > >>> Disallow: /vi/forum/adm > >>> Disallow: /vi/forum/cache > >>> Disallow: /vi/forum/docs > >>> Disallow: /vi/forum/files > >>> Disallow: /vi/forum/images > >>> Disallow: /vi/forum/includes > >>> Disallow: /vi/forum/language > >>> Disallow: /vi/forum/store > >>> Disallow: /vi/forum/styles > >>> Disallow: /zh/forum/common.php > >>> Disallow: /zh/forum/config.php > >>> Disallow: /zh/forum/con.php > >>> Disallow: /zh/forum/faq.php > >>> Disallow: /zh/forum/mcp.php > >>> Disallow: /zh/forum/memberlist.php > >>> Disallow: /zh/forum/posting.php > >>> Disallow: /zh/forum/report.php > >>> Disallow: /zh/forum/search.php > >>> Disallow: /zh/forum/style.php > >>> Disallow: /zh/forum/ucp.php > >>> Disallow: /zh/forum/viewonline.php > >>> Disallow: /zh/forum/adm > >>> Disallow: /zh/forum/cache > >>> Disallow: /zh/forum/docs > >>> Disallow: /zh/forum/files > >>> Disallow: /zh/forum/images > >>> Disallow: /zh/forum/includes > >>> Disallow: /zh/forum/language > >>> Disallow: /zh/forum/store > >>> Disallow: /zh/forum/styles > >>> > >>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 > >>> 23:40:14 GMT > >>> > >>> Forum search uses phpBB > >>> > >>> We haven’t allowed search engines to crawl forum.openoffice.org since > >>> before the Oracle donation to the ASF. > >>> > >>> Crawlers IP addresses might be blocked by ASF Infra if their use is > >>> excessive. That could give the 301. > >>> > >>> Regards, > >>> Dave > >>> > >>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <leg...@posteo.de> wrote: > >>>> > >>>> Hello all, > >>>> > >>>> > >>>> What I figured is that from the Google search tool the URL > >>>> forum.openoffice.org is not reachable. > >>>> > >>>> So I checked with Duckduckgo (my prefered Search engine), they don't use > >>>> crawler and point at the infra of Google, Bing and Yandex. > >>>> > >>>> I checked then with Bing, but could not figure out to check bots > >>>> feedback on an URL so I moved on > >>>> > >>>> I checked with Yandex. They have a search URL test page. I have entered > >>>> there forum.openoffice.org > >>>> > >>>> The Response is: > >>>> > >>>> ------------------------------------------------------------------------ > >>>> > >>>> * Date: Tue, 12 May 2020 10:37:47 GMT > >>>> * Server: Apache/2.4.18 (Ubuntu) > >>>> * Location: https://forum.openoffice.org/ > >>>> * Content-Length: 237 > >>>> * Keep-Alive: timeout=15, max=100 > >>>> * Connection: Keep-Alive > >>>> * Content-Type: text/html; charset=iso-8859-1 > >>>> > >>>> ------------------------------------------------------------------------ > >>>> > >>>> > >>>> HTTP status code 301 Moved Permanently > >>>> Server response time 133 ms > >>>> IP address 54.84.201.130 > >>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8) > >>>> Page size 237 B > >>>> > >>>> > >>>> I am not sure, what that means. HTTP Status Code moved Permanently reads > >>>> wrong. I just dont know if this is the return code from our webservcer > >>>> or a response code from the crawler. > >>>> I try to get someone from Infra. Or I'll open a ticket. > >>>> > >>>> > >>>> All the best > >>>> Peter > >>>> > >>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel: > >>>>> Hi Kay, > >>>>> > >>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk: > >>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote: > >>>>>>> Hi Kay, > >>>>>>> > >>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk: > >>>>>>>> Hi Peter... > >>>>>>>> > >>>>>>>> Since I am a Google Search admin for www.openoffice.org, and > >>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done > >>>>>>>> ANY work with the Google Search apis on these sites in quite some > >>>>>>>> time. > >>>>>>>> > >>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use > >>>>>>>> Google > >>>>>>>> Search until I saw this. > >>>>>>> I think, I added it to the list when we had a discussion about > >>>>>>> outdated > >>>>>>> information regarding SourceForge found by Google Search. > >>>>>>> > >>>>>>> But I don't have access to forum.openoffice.org, so I could never > >>>>>>> complete the step. > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Matthias > >>>>>> OK. In the top level of the website source, there is a file called > >>>>>> "skeleton.html" which references the following bit of code -- > >>>>>> > >>>>>> <!--#include virtual="/scripts/google-analytics.js" --> > >>>>>> > >>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I > >>>>>> forgot) but this this is example for the google-analytics code snippet > >>>>>> that is used. Basically, this needs to be included in the site you > >>>>>> want analytics to be used on by putting it in the (header) files that > >>>>>> generate the site. And, you might take a look at recent instructions > >>>>>> from Google. Things change. > >>>>>> > >>>>>> https://support.google.com/analytics/answer/1008080 > >>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the > >>>>> forum... > >>>>> The procedure for the Google Search Console is the same, it needs access > >>>>> to the root directory. > >>>>> > >>>>> Maybe Andrea can help if he is available again? > >>>>> > >>>>> Regards, > >>>>> > >>>>> Matthias > >>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Kay > >>>>>> > >>>>>>>> One of the Google Search admins for forum.openoffice.org could check > >>>>>>>> the current Google search apis that are in use on that site. Changes > >>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a > >>>>>>>> robots.txt for that site is causing this. I don't think it requires a > >>>>>>>> response, but maybe some investigation. > >>>>>>>> > >>>>>>>> Just some ideas... > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> > >>>>>>>> Kay > >>>>>>>> > >>>>>>>> > >>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote: > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I have received following mail. Probably because I am listed in the > >>>>>>>>> google-Analytics page. > >>>>>>>>> > >>>>>>>>> Does this has some action items? What can we answer Mr John Mueller? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> All the Best > >>>>>>>>> > >>>>>>>>> Peter > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -------- Weitergeleitete Nachricht -------- > >>>>>>>>> Betreff: Critical issue on forum.openoffice.org and Google > >>>>>>>>> Search > >>>>>>>>> Datum: Mon, 11 May 2020 13:37:27 +0200 > >>>>>>>>> Von: John Mueller <joh...@google.com> > >>>>>>>>> An: morsei...@gmail.com, kay.sch...@gmail.com, legi...@gmail.com > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org> > >>>>>>>>> > >>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your > >>>>>>>>> attention to a critical issue with your website, and how it's > >>>>>>>>> available for Google's web search. > >>>>>>>>> > >>>>>>>>> In particular, Googlebot has been unable to crawl URLs from > >>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop > >>>>>>>>> out of Google's search results, and will prevent new pages from > >>>>>>>>> being > >>>>>>>>> picked up for Search. If you're not aware of this issue, you may be > >>>>>>>>> accidentally blocking these pages from Google Search due to a server > >>>>>>>>> issue. If you need to block Googlebot from crawling pages on your > >>>>>>>>> website, we'd recommend using the robots.txt file instead. > >>>>>>>>> > >>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you > >>>>>>>>> can use a reverse IP lookup to do so: > >>>>>>>>> https://support.google.com/webmasters/answer/80553 > >>>>>>>>> > >>>>>>>>> Should you have any questions, feel free to contact me directly. For > >>>>>>>>> verification purposes, we are sending a copy of this message to your > >>>>>>>>> site's Search Console account. > >>>>>>>>> > >>>>>>>>> Thank you, > >>>>>>>>> John Mueller (joh...@google.com <mailto:joh...@google.com>) > >>>>>>>>> Webmaster Trends Analyst > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>>>>>> > >>>>>> --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >> For additional commands, e-mail: dev-h...@openoffice.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > For additional commands, e-mail: dev-h...@openoffice.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > For additional commands, e-mail: dev-h...@openoffice.apache.org > -- Rory O'Farrell <ofarr...@iol.ie> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org