Hello David,

I need a git "clone" indexer to index an as huge as possible database of
repo to make cyber security research for my job.

Hello Markus,

I am open to any proposition.

I did not found in the doc how to make a git clone only of a repo url from
the crawler indexer config regex. I also see in the source code there
https://github.com/apache/nutch/tree/master/src/plugin that the protocol
supported are present there. I doubt I could add my own custom protocol in
config. I hope I am wrong. If you are sure I could glone a repo in nucth
config directly, could you tell me how please?

If really you think I need to fork the repo, I can do it as well.

Best regards.

Le mar. 7 janv. 2025 à 16:01, Markus Jelsma <markus.jel...@openindex.io> a
écrit :

> Hi,
>
> Nutch is, just as Solr, highly customizable using all sorts of plugins.
> Forking it is not recommended. If you happen to come across behaviour in
> one of its tools that is not configurable, it can be made configurable.
>
> Regards,
> Markus
>
> Op di 7 jan 2025 om 16:52 schreef David Smiley <dsmi...@apache.org>:
>
> > Forking anything is a burden on you to maintain your fork.  You didn't
> say
> > *why* you want to fork something instead of simply use something.  You
> > mentioned adding features but search engine platforms like Solr are
> > designed to be highly pluggable/extensible without forking.  It's a
> > platform not a product.
> >
> > On Sun, Jan 5, 2025 at 6:36 PM anon <anonimoussech...@gmail.com> wrote:
> >
> > > Hello people!!
> > >
> > > I was going to fork sourcegraph because I was looking for a search
> > > engine specific to code source such as github and gitlab with the
> > > possibility to index decompiled file offline. then I read this
> copyright
> > >
> > >
> >
> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise
> > > <
> > >
> >
> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise
> > >
> > >
> > > it seems to be *more than* proprietary. Then I just found opensearch.
> It
> > > seems modular. I might fork it to:
> > > 1- index only source code from github/gitlab and from local to my
> > instance
> > > 2- use regex and codeql queries in the search client.
> > >
> > > Opensearch seems good but not modular enough.
> > >
> > >
> > > I think, solr the best choice for me. I will complete with a fork on
> > nutch.
> > >
> > > I think a Nutch fork would absolutely complete what I am looking for:
> > >
> > > - it is free software
> > >
> > > - it is modular on many protocol (not git yet), and solr compatible
> > >
> > > I suggest that I fork nutch to add a plugin there
> > > https://github.com/apache/nutch/tree/master/src/plugin under a new
> > > folder protocol-file and why not let people fork it.
> > >
> > > Is it a good idea?
> > >
> > > Best regards.
> > >
> >
>

Reply via email to