Hi, You should have your JIRA account now. You find useful helps on contributing here:
https://github.com/apache/solr/blob/main/CONTRIBUTING.md and in particular https://github.com/apache/solr/blob/main/dev-docs/how-to-contribute.adoc Jan > 27. nov. 2024 kl. 02:38 skrev Alex Z. <azagnio...@gmail.com>: > > I have a PR ready with changes (locally). I am just waiting for my JIRA > account to arrive. > > On Tue, Nov 26, 2024 at 2:37 PM Alex Z. <azagnio...@gmail.com> wrote: > >> Hi Jan, >> >> Thank you. I just applied for the ASF JIRA account. I will raise a ticket >> once my account is approved. I can try attempt a PR as well. >> >> Regards >> >> On Tue, Nov 26, 2024 at 1:43 PM Jan Høydahl <jan....@cominvent.com> wrote: >> >>> Hi, >>> >>> Thanks for finding this. Although I have not checked the code paths you >>> mention, I think this warrants a JIRA issue and a bug fix. Would you lke to >>> file a JIRA issue for us, and perhaps also attempt a GitHub Pull Request >>> with a fix. Ideally the PR would add a unit test that fails due to the bug >>> but passes after the fix. If you're not able to contribute a PR that's ok >>> as well. >>> >>> Jan >>> >>>> 26. nov. 2024 kl. 21:57 skrev Alex Z. <azagnio...@gmail.com>: >>>> >>>> Hello Solr Community, >>>> >>>> I’m seeking your feedback regarding an issue I’ve encountered when >>>> configuring the Solr Langid module, specifically when using the >>> deprecated >>>> langid.whitelist property instead of Solr’s newer langid.allowlist >>> property >>>> to define allowed language codes. >>>> >>>> As you are likely aware, the langid.whitelist property has been >>> deprecated >>>> since Solr 9.0.0, and the recommended approach is to use >>> langid.allowlist >>>> instead. I am indeed using the langid.allowlist property, but I would >>> like >>>> to bring attention to an issue I’ve observed with the legacy support for >>>> langid.whitelist. I believe there is a bug in the backward compatibility >>>> code that could cause unintended behavior when the langid.whitelist >>>> property is configured. >>>> >>>> To illustrate the problem, I’ll provide a detailed example based on the >>>> code: >>>> >>>> 1. >>>> >>>> *The check for legacyAllowList*: In the Solr code, specifically in the >>>> >>> https://github.com/apache/solr/blob/main/solr/modules/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java#L123-L127 >>> , >>>> there is a check for the length of the legacyAllowList string. >>> However, >>>> the legacyAllowList is never actually used after the length check in >>> the >>>> code. Instead, an empty string ("") is used as the default value when >>>> fetching the LANG_ALLOWLIST parameter. >>>> 2. >>>> >>>> *Resulting issue with the langAllowlist set*: As a result, the >>> Set<String> >>>> langAllowlist is populated with a single element: an empty string >>> (""). >>>> This causes an issue when the code checks if the langAllowlist is >>> empty >>>> in the later part of the code ( >>>> >>> https://github.com/apache/solr/blob/main/solr/modules/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java#L385-L405 >>> ) >>>> , specifically in this section. The check langAllowlist.isEmpty() >>>> incorrectly returns false because the set does contain an element - >>> the >>>> empty string. >>>> 3. >>>> >>>> *Unexpected fallback behavior*: Consequently, even though the language >>>> of the document might be correctly detected (for instance, if the >>> document >>>> is identified as being in German), the flow incorrectly enters the >>> "else" >>>> clause. This results in the log message: *"Detected a language not in >>>> allowlist (de), using fallback en"* and the fallback language is set >>> to >>>> English (en), even though the document language was correctly >>> identified >>>> as German. >>>> >>>> I believe this behavior stems from a bug in the backwards compatibility >>>> handling for the deprecated langid.whitelist property. If the >>>> legacyAllowList value is not being properly used or passed to the >>>> langAllowlist set, it leads to incorrect fallback behavior. >>>> >>>> I’d appreciate any insights or thoughts from the community on this >>> issue. >>>> Thank you in advance for your time! >>>> >>>> Alex >>> >>>