rzo1 opened a new issue, #1542: URL: https://github.com/apache/incubator-stormcrawler/issues/1542
# Description Many new users report that Apache StormCrawler (SC) is difficult to set up and run for the first time. To improve accessibility and lower the entry barrier, the documentation should include a beginner-friendly tutorial that walks through the basic setup and execution of a simple crawler topology. # Proposed Solution Add a new section or page in the documentation that includes: 1. Quickstart Tutorial A step-by-step guide that covers: - Setting up SC using Docker (or Docker Compose) - Setting up and configuring a basic topology - Submitting and running the topology on a local cluster (and on the docker compose environment) - Verifying that the crawler is working (e.g., viewing fetched URLs/logs) 2. Follow-up Topics Provide links or notes on how users can: - Extend the setup with custom configurations Use SC with Playwright or other browser automation tools - Handle politeness, filters, and parsing rules - Integrate storage (e.g., OpenSearch) - Building custom bolts # Motivation This will help new users get started quickly and understand the power of SC without having to dig through fragmented examples or advanced configuration too early. Better documentation will make SC more approachable and could help grow the community by reducing the initial learning curve. # Additional Context I’ve often heard that while SC is powerful, it's perceived as too complicated to get up and running. A quickstart tutorial could go a long way toward solving that issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org