Hi Yongjun, I'm sorry for my delay. One resource that would be handy to have (if it doesn't exist) would be documentation on what parameters can be set per domain in the `seeds.txt` file. I've found a few by looking through the source code, and this might be a useful exercise for you. The output would be useful to all (including me :D)!
An example would be `max.depth` as added here: https://github.com/apache/incubator-stormcrawler/issues/399 On Sun, Jan 26, 2025 at 7:14 AM Yongjun Hong <dev.yongj...@gmail.com> wrote: > Thank you for your attention. > > The email sent from kevin...@ was actually a mistake, as I confused the > accounts. > I will use the dev...@ email going forward. > > Have a great day! > > > > Hi, > > > > Yep. I just verified it. > > > > I think, that the issue is, that mails from your kevin…@ are put into > > moderation. > > So if you are using dev….@ it should just work fine. > > > > Gruß > > Richard > > > > > Am 26.01.2025 um 10:14 schrieb 홍용준 <dev.yongj...@gmail.com>: > > > > > > It seems that I am already subscribed to the mailing list. > > > I reached out to dev-subscr...@stormcrawler.apache.org <mailto: > > dev-subscr...@stormcrawler.apache.org> and received a > > > response indicating that my email address is already registered. > > > > > > This is my first time participating in the Apache mailing list and > > writing > > > messages, so I am still very new to this process. > > > If I am doing something wrong or missing anything important, I would > > > greatly appreciate it if you could let me know.🥹 > > > > > > 2025년 1월 26일 (일) 오후 5:54, Richard Zowalla <r...@apache.org <mailto: > > r...@apache.org>>님이 작성: > > > > > >> Hi, > > >> > > >> Maybe it would be also good to subscribe to the dev@ list? :) > > >> You’re mails are going into moderation and that would be away to > > >> circumvent that. > > >> > > >> Gruß > > >> Richard > > >> > > >>> Am 26.01.2025 um 05:52 schrieb 홍용준 <kevin09288...@gmail.com>: > > >>> > > >>> 다음은 자연스럽게 번역한 이메일입니다: > > >>> ------------------------------ > > >>> > > >>> Hi Julien, > > >>> > > >>> I apologize for the delayed response. > > >>> I’m still getting used to the Apache Mailing list, so I didn’t > realize > > I > > >>> had received a reply. 😅 > > >>> > > >>> As suggested, I will start by running StormCrawler. > > >>> Once I’ve built up some basic knowledge about the project, I will > > review > > >>> the discussion you shared and follow up with my thoughts. > > >>> > > >>> Thank you! > > >>> > > >>> Best regards, > > >>> Yongjun Hong > > >>> > > >>> On 2025/01/22 20:00:51 Julien Nioche wrote: > > >>>> Hi Yongjun Hong, > > >>>> > > >>>> Thanks for your email and your interest in contributing to > > StormCrawler. > > >>>> The project is quite mature and there isn't anything that is totally > > >>>> missing from it but there definitely are improvements to be made. > > >>>> Contributions take many forms and not just code: a good starting > point > > >>>> would be to run StormCrawler, look at the existing resources and > > >>>> documentation and see if anything is not clear or missing. Reporting > > >> bugs > > >>>> or asking questions is also very valuable. > > >>>> > > >>>> One thing I would like to see however is a new spout implementation > > for > > >>>> OpenSearch; I had written a description for Elasticsearch a while > ago > > >>>> <https://github.com/apache/incubator-stormcrawler/discussions/990> > > >> which > > >>> is > > >>>> valid for OpenSearch and could improve the performance > significantly. > > >>>> Probably a bit of an advanced topic though, maybe as a start just > run > > >>>> StormCrawler, delve into the details and see what you think? > > >>>> > > >>>> Thanks! > > >>>> > > >>>> Julien > > >>>> > > >>>> > > >>>> On Tue, 21 Jan 2025 at 23:57, 홍용준 <de...@gmail.com < > http://gmail.com/> > > <http://gmail.com/>> > > >> wrote: > > >>>> > > >>>>> Dear StormCrawler Team, > > >>>>> > > >>>>> My name is Yongjun Hong, and I am a software developer with a > strong > > >>>>> interest in contributing to open-source projects. I have had the > > >>> privilege > > >>>>> of contributing to JUnit 5 by implementing performance improvements > > and > > >>>>> adding new features, which allowed me to gain a deeper appreciation > > for > > >>>>> collaborative development within the open-source community. > > >>>>> > > >>>>> Beyond JUnit, I have also made frequent contributions to other > > >>> Java-based > > >>>>> open-source libraries, including Spring Boot. My work has focused > on > > >>>>> enhancing core functionality and resolving community-reported > issues, > > >>>>> aiming to improve both usability and reliability. > > >>>>> > > >>>>> I am particularly drawn to StormCrawler because of its robust and > > >>> scalable > > >>>>> web crawling capabilities, and I am eager to contribute > consistently❕ > > >> To > > >>>>> better understand how I can align my efforts with the project’s > > goals, > > >> I > > >>>>> would like to inquire about its current status, overall roadmap, > and > > >> any > > >>>>> key milestones in the near future. > > >>>>> > > >>>>> If there are areas where the project could use additional support, > or > > >> if > > >>>>> there are specific tasks or issues you think I could assist with, I > > >>> would > > >>>>> greatly appreciate your guidance. > > >>>>> > > >>>>> Thank you for your time and for the incredible work you have done > on > > >>>>> stormcrawler. I am excited about the possibility of contributing > and > > >>> look > > >>>>> forward to hearing from you. > > >>>>> > > >>>>> Here is my Github profile : https://github.com/YongGoose > > >>>>> > > >>>>> Best regards > > >>>>> > > >>>>> Yongjun Hong > > > > >