One thing I am seeing a ton of is google.com entries rather than GoogleBot

our_domain:443 47.79.206.79 - - [16/Jun/2025:00:00:09 -0700] "GET
/eg/opac/record/2620408?query=Fathers%20Juvenile%20fiction HTTP/1.0" 500
21258 "https://www.google.com/"; "Mozilla/5.0 (Linux; Android 10; K)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Mobile
Safari/537.36"
our_domain:443 47.79.206.22 - - [16/Jun/2025:00:00:08 -0700] "GET
/eg/opac/record/2621426?query=Allingham%20William%201824%201889 HTTP/1.0"
500 21258 "https://www.google.com/"; "Mozilla/5.0 (Linux; Android 10; K)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Mobile
Safari/537.36"

Do you think those are legitimate patron searches or more likely Google
scraping in a different way?
-Jon

On Mon, Jun 16, 2025 at 8:44 AM JonGeorg SageLibrary <
[email protected]> wrote:

> But that many? I just tried to reboot the app server and it froze on the
> advanced key value. I'm wondering if it's unrelated and like you said
> normal, and instead the docker managing the SSL cert is locked or something
> similar. I've reached out to the people hosting the servers to see if they
> have any insight. Thank you!
> -Jon
>
> On Mon, Jun 16, 2025 at 8:41 AM Bill Erickson <[email protected]> wrote:
>
>> Hi Jon,
>>
>> Those would be the patron catalog performing added content lookups.
>> Instead of directly reaching out to the vendor for the data, it leverages
>> the existing web api via internal requests (in asynchronous batches) to
>> collect the data.  Those are expected.
>>
>> -b
>>
>>
>>
>> On Mon, Jun 16, 2025 at 11:20 AM JonGeorg SageLibrary via
>> Evergreen-general <[email protected]> wrote:
>>
>>> Greetings.
>>> We've been slammed by bot traffic and had to take counter measures. We
>>> geoblocked international traffic at the host firewall level, and recently
>>> added a nginx bot blocker for bots based on servers in the US and Canada. I
>>> then scraped bot IPs out of the apache logs and began adding the IPs that
>>> were still coming through. Yes, I've updated the robots.txt file- they're
>>> ignoring it.
>>>
>>> The issue is that after a day or two of reprieve, we started getting a
>>> ton of 404's with loopback addresses. I've reverted the blacklist config
>>> file back to blank, and restarted all services on all servers. We're still
>>> getting a ton of traffic that appears to be internally generated.
>>>
>>> I don't see anything obvious within crontab. Since it appears to be
>>> internally generated, the opac stays up longer than it normally would with
>>> the number of sessions on the load balancer.
>>>
>>> Is there an Evergreen or Apache service that indexes the entire catalog?
>>> We have our external IP whitelisted. Do internal vlan IP addresses need
>>> whitelisted?
>>>
>>> Here's an example of the traffic I'm seeing. It's all on port 80 too,
>>> external traffic all comes on 443.
>>>
>>> our_domain:80 127.0.0.1 - - [16/Jun/2025:08:18:31 -0700] "HEAD
>>> /opac/extras/ac/anotes/html/r/2621889 HTTP/1.1" 404 159 "-" "-"
>>>
>>> -Jon
>>>
>>> _______________________________________________
>>> Evergreen-general mailing list --
>>> [email protected]
>>> To unsubscribe send an email to
>>> [email protected]
>>>
>>
_______________________________________________
Evergreen-general mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to