[ 
https://issues.apache.org/jira/browse/CODEC-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dianshu Liao updated CODEC-330:
-------------------------------
    Affects Version/s: 1.18.0

> org.apache.commons.codec.language.DaitchMokotoffSoundex.cleanup(String) does 
> not remove special characters (e.g., punctuation)
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CODEC-330
>                 URL: https://issues.apache.org/jira/browse/CODEC-330
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>         Environment: JDK 8, MacOS
>            Reporter: Dianshu Liao
>            Priority: Major
>         Attachments: Screenshot 2025-05-19 at 1.01.11 am.png
>
>
> File: org.apache.commons.codec.language.DaitchMokotoffSoundex
> Method: private String cleanup(String input)
> h1. 
> Problem
> The private method "private String cleanup(final String input)” in 
> DaitchMokotoffSoundex is intended to sanitize the input string before 
> applying the actual phonetic transformation. The implementation does not 
> remove any special characters such as !, @, #, or numbers. These characters 
> are preserved in the cleaned string, which can lead to incorrect or 
> unexpected phonetic results.
>  
> h1. Test Code
> package org.apache.commons.codec.language;
> import org.apache.commons.codec.language.DaitchMokotoffSoundex;
> import org.junit.Test;
> import java.lang.reflect.Method;
> import static org.junit.Assert.assertEquals;
> public class language_DaitchMokotoffSoundex_cleanup_Test {
>     @Test(timeout = 4000)
>     public void testCleanup() {
>         try {
>             // Instantiate the class
>             DaitchMokotoffSoundex soundex = new DaitchMokotoffSoundex();
>             // Access the private method using reflection
>             Method cleanupMethod = 
> DaitchMokotoffSoundex.class.getDeclaredMethod("cleanup", String.class);
>             cleanupMethod.setAccessible(true);
>             // Test input with whitespace
>             String input = "  Hello World  ";
>             String expectedOutput = "helloworld";
>             String actualOutput = (String) cleanupMethod.invoke(soundex, 
> input);
>             assertEquals(expectedOutput, actualOutput);
>             // Test input with special characters
>             input = "Te$t!@#";
>             expectedOutput = "test";
>             actualOutput = (String) cleanupMethod.invoke(soundex, input);
>             assertEquals(expectedOutput, actualOutput);
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
> h1. }
> Expected Result
> All non-letter characters (e.g., !, @, #, digits) should be removed as part 
> of the cleanup process to ensure reliable phonetic encoding.
> h1. 
> Actual Result
>  
> Special characters are preserved. For example "Te$t!@#" -> "te$t!@#"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to