Hello David, I need a git "clone" indexer to index an as huge as possible database of repo to make cyber security research for my job.
Hello Markus, I am open to any proposition. I did not found in the doc how to make a git clone only of a repo url from the crawler indexer config regex. I also see in the source code there https://github.com/apache/nutch/tree/master/src/plugin that the protocol supported are present there. I doubt I could add my own custom protocol in config. I hope I am wrong. If you are sure I could glone a repo in nucth config directly, could you tell me how please? If really you think I need to fork the repo, I can do it as well. Best regards. Le mar. 7 janv. 2025 à 16:01, Markus Jelsma <markus.jel...@openindex.io> a écrit : > Hi, > > Nutch is, just as Solr, highly customizable using all sorts of plugins. > Forking it is not recommended. If you happen to come across behaviour in > one of its tools that is not configurable, it can be made configurable. > > Regards, > Markus > > Op di 7 jan 2025 om 16:52 schreef David Smiley <dsmi...@apache.org>: > > > Forking anything is a burden on you to maintain your fork. You didn't > say > > *why* you want to fork something instead of simply use something. You > > mentioned adding features but search engine platforms like Solr are > > designed to be highly pluggable/extensible without forking. It's a > > platform not a product. > > > > On Sun, Jan 5, 2025 at 6:36 PM anon <anonimoussech...@gmail.com> wrote: > > > > > Hello people!! > > > > > > I was going to fork sourcegraph because I was looking for a search > > > engine specific to code source such as github and gitlab with the > > > possibility to index decompiled file offline. then I read this > copyright > > > > > > > > > https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise > > > < > > > > > > https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise > > > > > > > > > it seems to be *more than* proprietary. Then I just found opensearch. > It > > > seems modular. I might fork it to: > > > 1- index only source code from github/gitlab and from local to my > > instance > > > 2- use regex and codeql queries in the search client. > > > > > > Opensearch seems good but not modular enough. > > > > > > > > > I think, solr the best choice for me. I will complete with a fork on > > nutch. > > > > > > I think a Nutch fork would absolutely complete what I am looking for: > > > > > > - it is free software > > > > > > - it is modular on many protocol (not git yet), and solr compatible > > > > > > I suggest that I fork nutch to add a plugin there > > > https://github.com/apache/nutch/tree/master/src/plugin under a new > > > folder protocol-file and why not let people fork it. > > > > > > Is it a good idea? > > > > > > Best regards. > > > > > >