David:

I also would like to ensure I clarified correctly.

I absolutely need to index source code to my personal search engine to run
a regex in solr. I want to look for vulnerabilities with the regex.

COuld you provide the steps for a such configuration of nutch and
eventually solr please?

Best regards.

Le mer. 8 janv. 2025 à 15:25, anon anon <anonimoussech...@gmail.com> a
écrit :

> Hello David,
>
> I need a git "clone" indexer to index an as huge as possible database of
> repo to make cyber security research for my job.
>
> Hello Markus,
>
> I am open to any proposition.
>
> I did not found in the doc how to make a git clone only of a repo url from
> the crawler indexer config regex. I also see in the source code there
> https://github.com/apache/nutch/tree/master/src/plugin that the protocol
> supported are present there. I doubt I could add my own custom protocol in
> config. I hope I am wrong. If you are sure I could glone a repo in nucth
> config directly, could you tell me how please?
>
> If really you think I need to fork the repo, I can do it as well.
>
> Best regards.
>
> Le mar. 7 janv. 2025 à 16:01, Markus Jelsma <markus.jel...@openindex.io>
> a écrit :
>
>> Hi,
>>
>> Nutch is, just as Solr, highly customizable using all sorts of plugins.
>> Forking it is not recommended. If you happen to come across behaviour in
>> one of its tools that is not configurable, it can be made configurable.
>>
>> Regards,
>> Markus
>>
>> Op di 7 jan 2025 om 16:52 schreef David Smiley <dsmi...@apache.org>:
>>
>> > Forking anything is a burden on you to maintain your fork.  You didn't
>> say
>> > *why* you want to fork something instead of simply use something.  You
>> > mentioned adding features but search engine platforms like Solr are
>> > designed to be highly pluggable/extensible without forking.  It's a
>> > platform not a product.
>> >
>> > On Sun, Jan 5, 2025 at 6:36 PM anon <anonimoussech...@gmail.com> wrote:
>> >
>> > > Hello people!!
>> > >
>> > > I was going to fork sourcegraph because I was looking for a search
>> > > engine specific to code source such as github and gitlab with the
>> > > possibility to index decompiled file offline. then I read this
>> copyright
>> > >
>> > >
>> >
>> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise
>> > > <
>> > >
>> >
>> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise
>> > >
>> > >
>> > > it seems to be *more than* proprietary. Then I just found opensearch.
>> It
>> > > seems modular. I might fork it to:
>> > > 1- index only source code from github/gitlab and from local to my
>> > instance
>> > > 2- use regex and codeql queries in the search client.
>> > >
>> > > Opensearch seems good but not modular enough.
>> > >
>> > >
>> > > I think, solr the best choice for me. I will complete with a fork on
>> > nutch.
>> > >
>> > > I think a Nutch fork would absolutely complete what I am looking for:
>> > >
>> > > - it is free software
>> > >
>> > > - it is modular on many protocol (not git yet), and solr compatible
>> > >
>> > > I suggest that I fork nutch to add a plugin there
>> > > https://github.com/apache/nutch/tree/master/src/plugin under a new
>> > > folder protocol-file and why not let people fork it.
>> > >
>> > > Is it a good idea?
>> > >
>> > > Best regards.
>> > >
>> >
>>
>

Reply via email to