Markus: I probably misunderstood your remark.

Could it be possible to use a git clone protocol plugin please?

Le mer. 8 janv. 2025 à 15:41, anon anon <anonimoussech...@gmail.com> a
écrit :

> David:
>
> I also would like to ensure I clarified correctly.
>
> I absolutely need to index source code to my personal search engine to run
> a regex in solr. I want to look for vulnerabilities with the regex.
>
> COuld you provide the steps for a such configuration of nutch and
> eventually solr please?
>
> Best regards.
>
> Le mer. 8 janv. 2025 à 15:25, anon anon <anonimoussech...@gmail.com> a
> écrit :
>
>> Hello David,
>>
>> I need a git "clone" indexer to index an as huge as possible database of
>> repo to make cyber security research for my job.
>>
>> Hello Markus,
>>
>> I am open to any proposition.
>>
>> I did not found in the doc how to make a git clone only of a repo url
>> from the crawler indexer config regex. I also see in the source code there
>> https://github.com/apache/nutch/tree/master/src/plugin that the protocol
>> supported are present there. I doubt I could add my own custom protocol in
>> config. I hope I am wrong. If you are sure I could glone a repo in nucth
>> config directly, could you tell me how please?
>>
>> If really you think I need to fork the repo, I can do it as well.
>>
>> Best regards.
>>
>> Le mar. 7 janv. 2025 à 16:01, Markus Jelsma <markus.jel...@openindex.io>
>> a écrit :
>>
>>> Hi,
>>>
>>> Nutch is, just as Solr, highly customizable using all sorts of plugins.
>>> Forking it is not recommended. If you happen to come across behaviour in
>>> one of its tools that is not configurable, it can be made configurable.
>>>
>>> Regards,
>>> Markus
>>>
>>> Op di 7 jan 2025 om 16:52 schreef David Smiley <dsmi...@apache.org>:
>>>
>>> > Forking anything is a burden on you to maintain your fork.  You didn't
>>> say
>>> > *why* you want to fork something instead of simply use something.  You
>>> > mentioned adding features but search engine platforms like Solr are
>>> > designed to be highly pluggable/extensible without forking.  It's a
>>> > platform not a product.
>>> >
>>> > On Sun, Jan 5, 2025 at 6:36 PM anon <anonimoussech...@gmail.com>
>>> wrote:
>>> >
>>> > > Hello people!!
>>> > >
>>> > > I was going to fork sourcegraph because I was looking for a search
>>> > > engine specific to code source such as github and gitlab with the
>>> > > possibility to index decompiled file offline. then I read this
>>> copyright
>>> > >
>>> > >
>>> >
>>> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise
>>> > > <
>>> > >
>>> >
>>> https://github.com/sourcegraph/sourcegraph-public-snapshot/blob/main/LICENSE.enterprise
>>> > >
>>> > >
>>> > > it seems to be *more than* proprietary. Then I just found
>>> opensearch. It
>>> > > seems modular. I might fork it to:
>>> > > 1- index only source code from github/gitlab and from local to my
>>> > instance
>>> > > 2- use regex and codeql queries in the search client.
>>> > >
>>> > > Opensearch seems good but not modular enough.
>>> > >
>>> > >
>>> > > I think, solr the best choice for me. I will complete with a fork on
>>> > nutch.
>>> > >
>>> > > I think a Nutch fork would absolutely complete what I am looking for:
>>> > >
>>> > > - it is free software
>>> > >
>>> > > - it is modular on many protocol (not git yet), and solr compatible
>>> > >
>>> > > I suggest that I fork nutch to add a plugin there
>>> > > https://github.com/apache/nutch/tree/master/src/plugin under a new
>>> > > folder protocol-file and why not let people fork it.
>>> > >
>>> > > Is it a good idea?
>>> > >
>>> > > Best regards.
>>> > >
>>> >
>>>
>>

Reply via email to