Re: Crawling Italian language site in Solr

Markus Jelsma Fri, 28 Jul 2023 01:55:17 -0700

Hello Fiz,

This normally happens when websites are capable of responding with
translations of their content. Usually this is controlled by the client's
Accept-Lang header, and in worse cases, it is decided based on client
apparent IP address.


In Nutch you can test its output by using the bin/nutch indexchecker <URL>
command. This is the output that is sent to search engines such as Solr. So
if the language in Solr is suddenly differnet from what you expect, then
your problem lies in what Nutch receives and sends. Hence, your problem
lies in the web crawler domain, not in Solr.

Regards,
Markus

Ps, attached files usually don't work on the mailing list.

Op vr 28 jul 2023 om 08:08 schreef Fiz N <fiznewy...@gmail.com>:

> Hi SOLR Experts,
>
>  In Azure VM (Linux), we have installed Solr version 8.11.2 and Nutch
> Crawler (apache-nutch-1.19). Crawling the site for Italian Language we
> added the tokenizer. *In the Solr admin screen we could see the document
> but in English language.*
>
> Please see the below attached managed schema Code Changes.
>
>
>
> Regards
>
> Fiz A.
>
>

Re: Crawling Italian language site in Solr

Reply via email to