Re: [D] JavaScript pages [incubator-stormcrawler]

via GitHub Thu, 06 Jun 2024 04:58:24 -0700


GitHub user psa-neutronian closed a discussion: JavaScript pages


I'm attempting to use Storm Crawler for JavaScript pages and am having some 
difficulty in creating a generalised solution.  Also, it seems that Storm 
Crawler doesn't play nice with WebDriver for Firefox, aka Gecko Driver (or 
maybe non-Puppeteer WebDriver in general, I haven't gotten that far yet).

So, I have a two part question:

1. Has anyone else successfully used Storm Crawler with Gecko Driver?
2. Any suggestions on where to start for getting Storm Crawler to do a 
generalised search through rendered HTML for a link matching a description 
(e.g. "about us", or "terms and conditions") then following that link, which 
may be XHR, and grabbing said text? For paths to files, the text should be 
fairly easy (it's finding the links when they're JavaScript based that's 
challenging), for overlays, I'm completely stumped.

GitHub link: https://github.com/apache/incubator-stormcrawler/discussions/874

----
This is an automatically sent email for dev@stormcrawler.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@stormcrawler.apache.org

Re: [D] JavaScript pages [incubator-stormcrawler]

Reply via email to