Re: Help Needed: Distributed update Async Exception solr 8.8.2 - Update

2021-08-25 Thread Shawn Heisey

On 8/24/2021 10:08 PM, Reej Nayagam wrote:

  Okay, Got your point. But we cannot modify the java code to stop commits
for now. So my manager suggests we comment out the auto commit in
solrconfig instead, We are not sure if that is correct. His point is, let
us commit every time we index (that is through java passing the commit &
optimise params)  and remove the autocommit config in solrconfig.xml to
commit every 6 milliseconds. WIll it be the right approach?


Don't remove the autoCommit.  Frequent hard commits are vital for good 
operation -- it flushes data to disk and starts a new transaction log.  
Doing it with openSearcher set to false makes it VERY fast.


Solr ships with autoCommit at 15000 -- up to four times more frequently 
than you have it configured ... and it doesn't cause problems for 
users.  I like to increase that to 6 just so things are a little bit 
less busy, but 15000 would work too.


The commits that were causing problems for you are the ones sent by your 
indexing software, and those commits DO open a new searcher.  Opening a 
new searcher is the expensive part of a commit ... so your autoCommit is 
not a problem.


https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

(the article says SolrCloud ... but it applies just as much when Solr is 
NOT in Cloud mode)


Thanks,
Shawn



ICUFoldingFilter with preserveOriginal option?

2021-08-25 Thread Jan Høydahl
Hi,

I'm looking at using ICUFoldingFilter for a customer, to fold e.g. Genéve to 
Geneve and thus get better recall.
However, for some common Norwegian words, the folding makes them clash with 
super-common words so it becomes impossible to find exactly what you want.
I imagined if ICUFoldingFilter had a preserverOriginal=true option, then it 
could leave the original word in the index on the same position, and an exact 
match for "Genéve" would score better than the normalized one. But this filter 
does not support this.

Have anyone found a workaround for this, except from duplicating all content in 
different fields with different analysis and search across them with different 
weights?

Jan

Suggester only returning on suggestion

2021-08-25 Thread Kien, Theo
Dear Solr user community,

I have started to use the Solr 8.9 suggester. The definition is as follows:

{
   "searchComponent":{
  "suggest_test":{
 "name":"suggest_test",
 "class":"solr.SuggestComponent",
 "suggester":{
"name":"combinedSuggester",
"lookupImpl":"BlendedInfixLookupFactory",
"indexPath":"/var/solr/sprint2/suggester",
"dictionaryImpl":"DocumentDictionaryFactory",
"field":"COMBINED",
"suggestAnalyzerFieldType":"text_general",
"buildOnStartup":"false",
"buildOnCommit":"false"
 }
  }
   },
   "requestHandler":{
  "/suggest":{
 "name":"/suggest",
 "class":"solr.SearchHandler",
 "startup":"lazy",
 "defaults":{
"suggest":"true",
"suggest.count":"10",
"suggest.dictionary":"combinedSuggester"
 },
 "components":[
"suggest_test"
 ]
  }
   }
}

When I tested this on a relatively small set of documents, it was working as 
expected.
For example, when requesting "suggest?q=south" it would return "south africa" 
and "south korea" as suggestions.
Now, on a different core with more documents, the same query returns "south 
africa", but "suggest?q=south k" still return "south korea".

Does anybody have an idea why this is the case and how I can debug the 
behaviour of the suggester?

Thanks in advance.
Best regards
Theo Kien


Disclaimer

This e-mail message and any attachments (“message”) may contain confidential, 
privileged or proprietary information and is intended solely for the use of the 
named recipient(s). If you are not the intended recipient, you may not 
disclose, copy, distribute or retain any part of this message. If you have 
received this message in error, please inform the sender immediately by return 
e-mail and delete this message from your system. The BIS is not liable for any 
error in the content of this message and does not represent that it is 
uncorrupted and/or free of viruses. Views expressed in this message are those 
of the author and may not reflect those of the BIS.

By exchanging e-mails with the BIS it is understood that the BIS may collect, 
store and further use e-mail addresses and other personal information which may 
be provided therein. The BIS will treat such information as confidential.


Re: ICUFoldingFilter with preserveOriginal option?

2021-08-25 Thread André Widhani
Not with ICUFoldingFilter, but with the MappingCharFilter.

There you can supply a mapping file and skip baseletter mappings for the users' 
native language, because in their own language, they know the correct spelling 
... most of the time ... sometimes.

This does really help with multiple languages and you lose the convenience of 
ICUFoldingFilter.

André

From: Jan Høydahl 
Sent: Wednesday, 25 August 2021 15:43
To: users@solr.apache.org 
Subject: ICUFoldingFilter with preserveOriginal option?

External e-mail.


Hi,

I'm looking at using ICUFoldingFilter for a customer, to fold e.g. Genéve to 
Geneve and thus get better recall.
However, for some common Norwegian words, the folding makes them clash with 
super-common words so it becomes impossible to find exactly what you want.
I imagined if ICUFoldingFilter had a preserverOriginal=true option, then it 
could leave the original word in the index on the same position, and an exact 
match for "Genéve" would score better than the normalized one. But this filter 
does not support this.

Have anyone found a workaround for this, except from duplicating all content in 
different fields with different analysis and search across them with different 
weights?

Jan


Re: ICUFoldingFilter with preserveOriginal option?

2021-08-25 Thread Markus Jelsma
Hoi Jan,

ICUFoldingFilter and ASCIIFoldingFilter i think do not respect the
keyword=true attribute when i last checked. If you use
KeywordRepeatFilter and modify the said TokenFilters to respect the
keyword attribute, the problem seems solved.

Regards,
Markus

2021-08-25 16:32 GMT+02:00, André Widhani :
> Not with ICUFoldingFilter, but with the MappingCharFilter.
>
> There you can supply a mapping file and skip baseletter mappings for the
> users' native language, because in their own language, they know the correct
> spelling ... most of the time ... sometimes.
>
> This does really help with multiple languages and you lose the convenience
> of ICUFoldingFilter.
>
> André
> 
> From: Jan Høydahl 
> Sent: Wednesday, 25 August 2021 15:43
> To: users@solr.apache.org 
> Subject: ICUFoldingFilter with preserveOriginal option?
>
> External e-mail.
>
>
> Hi,
>
> I'm looking at using ICUFoldingFilter for a customer, to fold e.g. Genéve to
> Geneve and thus get better recall.
> However, for some common Norwegian words, the folding makes them clash with
> super-common words so it becomes impossible to find exactly what you want.
> I imagined if ICUFoldingFilter had a preserverOriginal=true option, then it
> could leave the original word in the index on the same position, and an
> exact match for "Genéve" would score better than the normalized one. But
> this filter does not support this.
>
> Have anyone found a workaround for this, except from duplicating all content
> in different fields with different analysis and search across them with
> different weights?
>
> Jan
>