One thing I am seeing a ton of is google.com entries rather than GoogleBot our_domain:443 47.79.206.79 - - [16/Jun/2025:00:00:09 -0700] "GET /eg/opac/record/2620408?query=Fathers%20Juvenile%20fiction HTTP/1.0" 500 21258 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Mobile Safari/537.36" our_domain:443 47.79.206.22 - - [16/Jun/2025:00:00:08 -0700] "GET /eg/opac/record/2621426?query=Allingham%20William%201824%201889 HTTP/1.0" 500 21258 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Mobile Safari/537.36"
Do you think those are legitimate patron searches or more likely Google scraping in a different way? -Jon On Mon, Jun 16, 2025 at 8:44 AM JonGeorg SageLibrary < [email protected]> wrote: > But that many? I just tried to reboot the app server and it froze on the > advanced key value. I'm wondering if it's unrelated and like you said > normal, and instead the docker managing the SSL cert is locked or something > similar. I've reached out to the people hosting the servers to see if they > have any insight. Thank you! > -Jon > > On Mon, Jun 16, 2025 at 8:41 AM Bill Erickson <[email protected]> wrote: > >> Hi Jon, >> >> Those would be the patron catalog performing added content lookups. >> Instead of directly reaching out to the vendor for the data, it leverages >> the existing web api via internal requests (in asynchronous batches) to >> collect the data. Those are expected. >> >> -b >> >> >> >> On Mon, Jun 16, 2025 at 11:20 AM JonGeorg SageLibrary via >> Evergreen-general <[email protected]> wrote: >> >>> Greetings. >>> We've been slammed by bot traffic and had to take counter measures. We >>> geoblocked international traffic at the host firewall level, and recently >>> added a nginx bot blocker for bots based on servers in the US and Canada. I >>> then scraped bot IPs out of the apache logs and began adding the IPs that >>> were still coming through. Yes, I've updated the robots.txt file- they're >>> ignoring it. >>> >>> The issue is that after a day or two of reprieve, we started getting a >>> ton of 404's with loopback addresses. I've reverted the blacklist config >>> file back to blank, and restarted all services on all servers. We're still >>> getting a ton of traffic that appears to be internally generated. >>> >>> I don't see anything obvious within crontab. Since it appears to be >>> internally generated, the opac stays up longer than it normally would with >>> the number of sessions on the load balancer. >>> >>> Is there an Evergreen or Apache service that indexes the entire catalog? >>> We have our external IP whitelisted. Do internal vlan IP addresses need >>> whitelisted? >>> >>> Here's an example of the traffic I'm seeing. It's all on port 80 too, >>> external traffic all comes on 443. >>> >>> our_domain:80 127.0.0.1 - - [16/Jun/2025:08:18:31 -0700] "HEAD >>> /opac/extras/ac/anotes/html/r/2621889 HTTP/1.1" 404 159 "-" "-" >>> >>> -Jon >>> >>> _______________________________________________ >>> Evergreen-general mailing list -- >>> [email protected] >>> To unsubscribe send an email to >>> [email protected] >>> >>
_______________________________________________ Evergreen-general mailing list -- [email protected] To unsubscribe send an email to [email protected]
