Re: Crawling Italian language site in Solr
Hello Fiz, This normally happens when websites are capable of responding with translations of their content. Usually this is controlled by the client's Accept-Lang header, and in worse cases, it is decided based on client apparent IP address. In Nutch you can test its output by using the bin/nutch indexchecker command. This is the output that is sent to search engines such as Solr. So if the language in Solr is suddenly differnet from what you expect, then your problem lies in what Nutch receives and sends. Hence, your problem lies in the web crawler domain, not in Solr. Regards, Markus Ps, attached files usually don't work on the mailing list. Op vr 28 jul 2023 om 08:08 schreef Fiz N : > Hi SOLR Experts, > > In Azure VM (Linux), we have installed Solr version 8.11.2 and Nutch > Crawler (apache-nutch-1.19). Crawling the site for Italian Language we > added the tokenizer. *In the Solr admin screen we could see the document > but in English language.* > > Please see the below attached managed schema Code Changes. > > > > Regards > > Fiz A. > >
JSON boolean query syntax with edismax as default QueryParser
Hi Solr colleagues, On Solr 8.4.1, we’ve noticed that the following types of JSON DSL queries work if our luceneMatchVersion is 7.1 or lower, or if our default query parser is set to lucene: {"query":{"bool":{"must":[{"lucene":{"query":"plasticity","df":"title_a_index"}}]}}} However, if the query parser is set to edismax and the luceneMatchVersion is 7.2 or higher, the parsed query visible with debug=true becomes a complete mess, searching for the terms “bool” and “must”, rather than the terms we actually want to search for: +(DisjunctionMaxQuery(((author_main_unstem_search:bool)^1000.0 | (local_subject_unstem_search:bool)^15.0 | (author_unstem_search:bool)^40.0 […] Also while debug=true, we noticed that the JSON DSL queries get converted into a querystring with local params: ”{!bool must=$_tt1 }”. So I am suspecting these two changes in Solr 7.2 as the reason we can’t use Boolean JSON queries with edismax and a recent luceneMatchVersion: https://solr.apache.org/docs/7_2_0/changes/Changes.html#v7.2.0.upgrade_notes. Does that seem correct? Also, could this be related to the question Benjamin Armintor asked on June 23 (subject: Changes to JSON query API/syntax in Solr 9.x?)? I’m specifically curious about whether a luceneMatchVersion of 7.1 or lower still works in Solr 9? Thanks for your insights, -Jane -- Jane Sandberg (she/her) Library Software Engineer, Discovery and Access Services
Re: Slow softCommits under heavy load?
On 7/23/23 05:24, Koen De Groote wrote: After having a look at these files: No, I cannot share them. What I can say is that there's a couple hundred fields, dynamicFields and copyFields(each). The updatehandler uses solr.DirectUpdateHandler2(the only one I can see in the source code extending the regular updateHandler), with a max autoCommit time of 6 and a max autoSoftCommit time of 1000 You can cause yourself no end of problems with that super short autoSoftCommit. It can lead to lots of commits happening at the same time. What I would start with is reducing the autoCommit interval, to 3 or 15000, and greatly increasing the autoSoftCommit interval, to at least 3, maybe even as high as 12. Stop sending explicit commits. You especially don't want to do a commit after every document ... that has the potential to be even worse than one autoSoftCommit per second. If possible, you should also be indexing a lot more than one document per indexing request. Thanks, Shawn
Re: [EXTERNAL] Re: upgrade to 8.6 to 9.2
On 7/21/23 09:03, Oakley, Craig (NIH/NLM/NCBI) [C] wrote: On thing that comes to mind is to have this in your start.sh script: export SOLR_JETTY_HOST="0.0.0.0" This is a good point. For security reasons similar to other software like MySQL, Solr 9 only listens on localhost by default. If you want to access Solr outside of the server itself, you have to define the SOLR_JETTY_HOST environment variable as Craig mentions. Thanks, Shawn
Re: Add a new Shard to the collection
Hello Hari. If new shards are handling queries and updates well it's ok to have old shard inactive. You can request DELETESHARD to reclaim the disk space. On Mon, Jul 24, 2023 at 6:19 PM HariBabu kuruva wrote: > Hi All, > > I would like to add a new shard to the existing collection to have better > performance. Currently we have one shard. > > Solr - 8.11.1 > Nodes(servers) - 10 (Non prod - 4 nodes) > Zookeepers-5 > > I have tried the SPLITSHARD command in one of the non prod environments. > > * > https://solrserver.corp.company.com:8981/solr/admin/collections?action=SPLITSHARD&collection=abcStore&shard=shard1 > < > https://solrserver.corp.company.com:8981/solr/admin/collections?action=SPLITSHARD&collection=abcStore&shard=shard1 > >* > Now i can see total 3 shards > Shard1 > Shard1_0 > Shard1_1 > > But Shard1 is shown as inactive. Please let me know if we need to remove > this ? > > Please help me if this is the correct way of splitting the shard. > Are there any impacts to the data because of this ? > What are the measures to be taken while doing this in a PROD environment. > > -- > > Thanks and Regards, > Hari > Mobile:9790756568 > -- Sincerely yours Mikhail Khludnev