David: I also would like to ensure I clarified correctly.
I absolutely need to index source code to my personal search engine to run a regex in solr. I want to look for vulnerabilities with the regex. COuld you provide the steps for a such configuration of nutch and eventually solr please? Best regards. Le mer. 8 janv. 2025 à 15:25, anon anon <anonimoussech...@gmail.com> a écrit : > Hello David, > > I need a git "clone" indexer to index an as huge as possible database of > repo to make cyber security research for my job. > > Hello Markus, > > I am open to any proposition. > > I did not found in the doc how to make a git clone only of a repo url from > the crawler indexer config regex. I also see in the source code there > https://github.com/apache/nutch/tree/master/src/plugin that the protocol > supported are present there. I doubt I could add my own custom protocol in > config. I hope I am wrong. If you are sure I could glone a repo in nucth > config directly, could you tell me how please? > > If really you think I need to fork the repo, I can do it as well. > > Best regards. > > Le mar. 7 janv. 2025 à 16:01, Markus Jelsma <markus.jel...@openindex.io> > a écrit : > >> Hi, >> >> Nutch is, just as Solr, highly customizable using all sorts of plugins. >> Forking it is not recommended. If you happen to come across behaviour in >> one of its tools that is not configurable, it can be made configurable. >> >> Regards, >> Markus >> >> Op di 7 jan 2025 om 16:52 schreef David Smiley <dsmi...@apache.org>: >> >> > Forking anything is a burden on you to maintain your fork. You didn't >> say >> > *why* you want to fork something instead of simply use something. You >> > mentioned adding features but search engine platforms like Solr are >> > designed to be highly pluggable/extensible without forking. It's a >> > platform not a product. >> > >> > On Sun, Jan 5, 2025 at 6:36 PM anon <anonimoussech...@gmail.com> wrote: >> > >> > > Hello people!! >> > > >> > > I was going to fork sourcegraph because I was looking for a search >> > > engine specific to code source such as github and gitlab with the >> > > possibility to index decompiled file offline. then I read this >> copyright >> > > >> > > >> > >> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise >> > > < >> > > >> > >> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise >> > > >> > > >> > > it seems to be *more than* proprietary. Then I just found opensearch. >> It >> > > seems modular. I might fork it to: >> > > 1- index only source code from github/gitlab and from local to my >> > instance >> > > 2- use regex and codeql queries in the search client. >> > > >> > > Opensearch seems good but not modular enough. >> > > >> > > >> > > I think, solr the best choice for me. I will complete with a fork on >> > nutch. >> > > >> > > I think a Nutch fork would absolutely complete what I am looking for: >> > > >> > > - it is free software >> > > >> > > - it is modular on many protocol (not git yet), and solr compatible >> > > >> > > I suggest that I fork nutch to add a plugin there >> > > https://github.com/apache/nutch/tree/master/src/plugin under a new >> > > folder protocol-file and why not let people fork it. >> > > >> > > Is it a good idea? >> > > >> > > Best regards. >> > > >> > >> >