On Wed, 8 Jan 2025 16:45:12 +0100 Hans Henrik Bergan <divinity76+c...@gmail.com> wrote:
thanks Henrik, > The website is using the Cloudflare "Bot Fight Mode" thing which is > "protecting the website against bots", > dillo, w3c, and curl are all triggering the "are you a human?" w3c -> w3m i don;t get the 'are you a human' page with any of them > challenge page, and none of them are capable of passing it. > > It is strange that your dillo is not triggering it, probably something > to do with your Dillo's IP. IP meaning -> internet facing address and not intellectual property? w3m also gets the page all three are tryihg to get page from same internet facing address btw how did you find out about cloudfare running the 'bot fight mode'? what is it 'checking' for ? the dillo doesn't have js or cookies ?? and all use the same user-agent until this is figured out - i can use w3m to save the page 'automatically' in place of curl > > Anyway, your best bet to automate anything on that page is with > "headless chrome running in headless mode" - idk how CF is detecting > it, but it detects headless chromium running headless as bots, but it > does not detect headless-chromium-running-in-headful-mode as bots. > > On Wed, 8 Jan 2025 at 16:23, toby via curl-users > <curl-users@lists.haxx.se> wrote: > > > > Maybe someone can help me with this > > > > dillo > > https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent > > gives a good page > > > > curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html > > https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent > > results in a page (crypto.html) saying it needs javascript and cookies and > > the cookies.txt file is 'empty' > > > > dillo doesn't do cookies or javascript either > > w3m gives a good page - it does have > > -- > > Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users > > Etiquette: https://curl.se/mail/etiquette.html -- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users Etiquette: https://curl.se/mail/etiquette.html