Hi Ashkar,

Yes you can do all these things  - but not with Solr, which doesn't come with a built-in website crawler. You'll need to look at some other projects for that such as:
http://crawler.archive.org/index.html Heritrix
http://lucene.apache.org/nutch/ Nutch (created by Doug Cutting who also created Lucene) - there's a tutorial that includes Solr https://cwiki.apache.org/confluence/display/nutch/NutchTutorial
https://manifoldcf.apache.org/en_US/index.html ManifoldCF


There's a few other options on this (slightly old) page https://cwiki.apache.org/confluence/display/SOLR/SolrEcosystem - and there are probably hundreds of other options, including writing your own.


Best

Charlie

On 04/12/2023 08:28, Ashkar wrote:
Hi Solr Users,

I have a few questions.

 1. Can I crawl One Drive and index the documents?
 2. Are we able to crawl a website that has a login?
 3. Can we crawl documents from an HTTP/HTTPS-based portal and do the
    indexing?


Regards,

Logo

        


*Ashkar*

System Analyst

*M***+91 9605043094

*E ****_ash...@chimeratechnologies.com <mailto:apoor...@chimeratechnologies.com>_*

*W *_www.chimeratechnologies.com <http://www.chimeratechnologies.com/>_

Solutions for : FinTech | InsurTech | HRTech | Monitoring | Governance
Offered as : Product Development | Application Management | QA and Testing




****Disclaimer **** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. The unauthorized use, dissemination, distribution, or reproduction of this e-mail, including attachments, is prohibited and may be unlawful. This e-mail may contain viruses. Chimera has taken every reasonable precaution to minimize this risk but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Chimera reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Chimeras' e-mail system..

--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II

Reply via email to