Markus: I probably misunderstood your remark. Could it be possible to use a git clone protocol plugin please?
Le mer. 8 janv. 2025 à 15:41, anon anon <anonimoussech...@gmail.com> a écrit : > David: > > I also would like to ensure I clarified correctly. > > I absolutely need to index source code to my personal search engine to run > a regex in solr. I want to look for vulnerabilities with the regex. > > COuld you provide the steps for a such configuration of nutch and > eventually solr please? > > Best regards. > > Le mer. 8 janv. 2025 à 15:25, anon anon <anonimoussech...@gmail.com> a > écrit : > >> Hello David, >> >> I need a git "clone" indexer to index an as huge as possible database of >> repo to make cyber security research for my job. >> >> Hello Markus, >> >> I am open to any proposition. >> >> I did not found in the doc how to make a git clone only of a repo url >> from the crawler indexer config regex. I also see in the source code there >> https://github.com/apache/nutch/tree/master/src/plugin that the protocol >> supported are present there. I doubt I could add my own custom protocol in >> config. I hope I am wrong. If you are sure I could glone a repo in nucth >> config directly, could you tell me how please? >> >> If really you think I need to fork the repo, I can do it as well. >> >> Best regards. >> >> Le mar. 7 janv. 2025 à 16:01, Markus Jelsma <markus.jel...@openindex.io> >> a écrit : >> >>> Hi, >>> >>> Nutch is, just as Solr, highly customizable using all sorts of plugins. >>> Forking it is not recommended. If you happen to come across behaviour in >>> one of its tools that is not configurable, it can be made configurable. >>> >>> Regards, >>> Markus >>> >>> Op di 7 jan 2025 om 16:52 schreef David Smiley <dsmi...@apache.org>: >>> >>> > Forking anything is a burden on you to maintain your fork. You didn't >>> say >>> > *why* you want to fork something instead of simply use something. You >>> > mentioned adding features but search engine platforms like Solr are >>> > designed to be highly pluggable/extensible without forking. It's a >>> > platform not a product. >>> > >>> > On Sun, Jan 5, 2025 at 6:36 PM anon <anonimoussech...@gmail.com> >>> wrote: >>> > >>> > > Hello people!! >>> > > >>> > > I was going to fork sourcegraph because I was looking for a search >>> > > engine specific to code source such as github and gitlab with the >>> > > possibility to index decompiled file offline. then I read this >>> copyright >>> > > >>> > > >>> > >>> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise >>> > > < >>> > > >>> > >>> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise >>> > > >>> > > >>> > > it seems to be *more than* proprietary. Then I just found >>> opensearch. It >>> > > seems modular. I might fork it to: >>> > > 1- index only source code from github/gitlab and from local to my >>> > instance >>> > > 2- use regex and codeql queries in the search client. >>> > > >>> > > Opensearch seems good but not modular enough. >>> > > >>> > > >>> > > I think, solr the best choice for me. I will complete with a fork on >>> > nutch. >>> > > >>> > > I think a Nutch fork would absolutely complete what I am looking for: >>> > > >>> > > - it is free software >>> > > >>> > > - it is modular on many protocol (not git yet), and solr compatible >>> > > >>> > > I suggest that I fork nutch to add a plugin there >>> > > https://github.com/apache/nutch/tree/master/src/plugin under a new >>> > > folder protocol-file and why not let people fork it. >>> > > >>> > > Is it a good idea? >>> > > >>> > > Best regards. >>> > > >>> > >>> >>