[I] Improve Documentation: Add a Beginner-Friendly Tutorial for Getting Started with SC [incubator-stormcrawler]

via GitHub Mon, 26 May 2025 06:30:31 -0700


rzo1 opened a new issue, #1542:
URL: https://github.com/apache/incubator-stormcrawler/issues/1542


   # Description
   
   Many new users report that Apache StormCrawler (SC) is difficult to set up 
and run for the first time. To improve accessibility and lower the entry 
barrier, the documentation should include a beginner-friendly tutorial that 
walks through the basic setup and execution of a simple crawler topology.
   
   # Proposed Solution
   
   Add a new section or page in the documentation that includes:
   
   1. Quickstart Tutorial
   A step-by-step guide that covers:
   
   - Setting up SC using Docker (or Docker Compose)
   - Setting up and configuring a basic topology
   - Submitting and running the topology on a local cluster (and on the docker 
compose environment)
   - Verifying that the crawler is working (e.g., viewing fetched URLs/logs)
   
   2. Follow-up Topics
   
   Provide links or notes on how users can:
   
   - Extend the setup with custom configurations
    Use SC with Playwright or other browser automation tools
   - Handle politeness, filters, and parsing rules
   - Integrate storage (e.g., OpenSearch)
   - Building custom bolts
   
   # Motivation
   
   This will help new users get started quickly and understand the power of SC 
without having to dig through fragmented examples or advanced configuration too 
early. Better documentation will make SC more approachable and could help grow 
the community by reducing the initial learning curve.
   
   # Additional Context
   
   I’ve often heard that while SC is powerful, it's perceived as too 
complicated to get up and running. A quickstart tutorial could go a long way 
toward solving that issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Improve Documentation: Add a Beginner-Friendly Tutorial for Getting Started with SC [incubator-stormcrawler]

Reply via email to