On 06/09/2025 17:20, Wayne Sherman via fpc-pascal wrote:
Something similar can happen when search engine crawlers are blocked.
How useful would Google Search be for Free Pascal related queries if
none of the Free Pascal websites were indexed?
What do you think? How would FPC and Lazarus related queries perform
on an LLM with no training data for FPC and Lazarus in the form of
articles, answers, and discussions?
As far as I understand neither search engines, nor AI crawlers are
blocked by default.
The pages have a robot.txt, and any bot that obeys that afaik can crawl.
Bots that don't, get blocked. (AFAIK)
On the wiki there is Anubis, but some bots are excluded from being
checked, and in principal, bots can solve this, the same as normal
browsers do for their users. Anubis does not block access. It just adds
a cost, and that cost is tiny, unless you keep getting your IP banned,
and you keep coming from multiple IP, and have to solve it many thousand
(or 100k) times. Then it starts accumulating cost.
There are Companies that provide crawl services, using that many
different IP, from all over the world. Those companies know that they
break their target content providers (they go to great length to keep
breaking others services). They don't care.
But as long as there are other companies that respect the robot.txt,
then those abusers will loose out eventually, as they will not have the
data to compete. Their problem.
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal