[ 
https://issues.apache.org/jira/browse/CODEC-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary D. Gregory resolved CODEC-331.
-----------------------------------
    Fix Version/s: 1.19.0
       Resolution: Fixed

Hello [~ilikecode] 

This is fixed in git master and snapshot builds in 
[https://repository.apache.org/content/repositories/snapshots/]

Please verify and close this ticket if your use case is fixed.

TY!

 

 

> org.apache.commons.codec.language.bm.Rule.parsePhonemeExpr(String) adds 
> duplicate empty phoneme when input ends with |
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CODEC-331
>                 URL: https://issues.apache.org/jira/browse/CODEC-331
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>         Environment: Affected Version: 1.18.1 (I found this version from my 
> pom.xml)
> MacOS
> JDK 8
>            Reporter: IlikeCode
>            Priority: Major
>             Fix For: 1.19.0
>
>         Attachments: Screenshot 2025-05-19 at 8.11.02 am.png
>
>
> Component: org.apache.commons.codec.language.bm.Rule
> Method: private static PhonemeExpr parsePhonemeExpr(String ph)
>  
> h1. Problem
> When the input string is *(()|)*
> The method *parsePhonemeExpr(String)* first strips the parentheses, 
> producing: *body = "()|”*
> Then it executes *body.split("[|]")*
> Due to Java's default behavior, the trailing empty string (after the {*}|{*}) 
> is discarded, resulting in *["()"]*
> To compensate for this, the following logic is used:
> if (body.startsWith("|") || body.endsWith("|"))
> {     phs.add(new Phoneme("", Languages.ANY_LANGUAGE)); }
> However, the *"()"* entry already results in a *Phoneme("")* when parsed.
> As a result, the list ends up containing two empty phonemes, which seems 
> unintended.
> h1. Expected Result
> Only one empty phoneme should be added for (()|).
>  
> h1. Actual Result
>  
> Two empty phonemes are returned:
>  - One from parsing "()"
>  - One manually added due to .endsWith("|")
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to