Hi,

You should have your JIRA account now. You find useful helps on contributing 
here: 

https://github.com/apache/solr/blob/main/CONTRIBUTING.md 

and in particular

https://github.com/apache/solr/blob/main/dev-docs/how-to-contribute.adoc

Jan

> 27. nov. 2024 kl. 02:38 skrev Alex Z. <azagnio...@gmail.com>:
> 
> I have a PR ready with changes (locally). I am just waiting for my JIRA
> account to arrive.
> 
> On Tue, Nov 26, 2024 at 2:37 PM Alex Z. <azagnio...@gmail.com> wrote:
> 
>> Hi Jan,
>> 
>> Thank you. I just applied for the ASF JIRA account. I will raise a ticket
>> once my account is approved. I can try attempt a PR as well.
>> 
>> Regards
>> 
>> On Tue, Nov 26, 2024 at 1:43 PM Jan Høydahl <jan....@cominvent.com> wrote:
>> 
>>> Hi,
>>> 
>>> Thanks for finding this. Although I have not checked the code paths you
>>> mention, I think this warrants a JIRA issue and a bug fix. Would you lke to
>>> file a JIRA issue for us, and perhaps also attempt a GitHub Pull Request
>>> with a fix. Ideally the PR would add a unit test that fails due to the bug
>>> but passes after the fix. If you're not able to contribute a PR that's ok
>>> as well.
>>> 
>>> Jan
>>> 
>>>> 26. nov. 2024 kl. 21:57 skrev Alex Z. <azagnio...@gmail.com>:
>>>> 
>>>> Hello Solr Community,
>>>> 
>>>> I’m seeking your feedback regarding an issue I’ve encountered when
>>>> configuring the Solr Langid module, specifically when using the
>>> deprecated
>>>> langid.whitelist property instead of Solr’s newer langid.allowlist
>>> property
>>>> to define allowed language codes.
>>>> 
>>>> As you are likely aware, the langid.whitelist property has been
>>> deprecated
>>>> since Solr 9.0.0, and the recommended approach is to use
>>> langid.allowlist
>>>> instead. I am indeed using the langid.allowlist property, but I would
>>> like
>>>> to bring attention to an issue I’ve observed with the legacy support for
>>>> langid.whitelist. I believe there is a bug in the backward compatibility
>>>> code that could cause unintended behavior when the langid.whitelist
>>>> property is configured.
>>>> 
>>>> To illustrate the problem, I’ll provide a detailed example based on the
>>>> code:
>>>> 
>>>>  1.
>>>> 
>>>>  *The check for legacyAllowList*: In the Solr code, specifically in the
>>>> 
>>> https://github.com/apache/solr/blob/main/solr/modules/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java#L123-L127
>>> ,
>>>>  there is a check for the length of the legacyAllowList string.
>>> However,
>>>>  the legacyAllowList is never actually used after the length check in
>>> the
>>>>  code. Instead, an empty string ("") is used as the default value when
>>>>  fetching the LANG_ALLOWLIST parameter.
>>>>  2.
>>>> 
>>>>  *Resulting issue with the langAllowlist set*: As a result, the
>>> Set<String>
>>>>  langAllowlist is populated with a single element: an empty string
>>> ("").
>>>>  This causes an issue when the code checks if the langAllowlist is
>>> empty
>>>>  in the later part of the code (
>>>> 
>>> https://github.com/apache/solr/blob/main/solr/modules/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java#L385-L405
>>> )
>>>>  , specifically in this section. The check langAllowlist.isEmpty()
>>>>  incorrectly returns false because the set does contain an element -
>>> the
>>>>  empty string.
>>>>  3.
>>>> 
>>>>  *Unexpected fallback behavior*: Consequently, even though the language
>>>>  of the document might be correctly detected (for instance, if the
>>> document
>>>>  is identified as being in German), the flow incorrectly enters the
>>> "else"
>>>>  clause. This results in the log message: *"Detected a language not in
>>>>  allowlist (de), using fallback en"* and the fallback language is set
>>> to
>>>>  English (en), even though the document language was correctly
>>> identified
>>>>  as German.
>>>> 
>>>> I believe this behavior stems from a bug in the backwards compatibility
>>>> handling for the deprecated langid.whitelist property. If the
>>>> legacyAllowList value is not being properly used or passed to the
>>>> langAllowlist set, it leads to incorrect fallback behavior.
>>>> 
>>>> I’d appreciate any insights or thoughts from the community on this
>>> issue.
>>>> Thank you in advance for your time!
>>>> 
>>>> Alex
>>> 
>>> 

Reply via email to