Dianshu Liao created CODEC-330:
----------------------------------

             Summary: 
org.apache.commons.codec.language.DaitchMokotoffSoundex.cleanup(String) does 
not remove special characters (e.g., punctuation)
                 Key: CODEC-330
                 URL: https://issues.apache.org/jira/browse/CODEC-330
             Project: Commons Codec
          Issue Type: Bug
    Affects Versions: 1.18.1
         Environment: JDK 8, MacOS
            Reporter: Dianshu Liao
         Attachments: Screenshot 2025-05-19 at 1.01.11 am.png

File: org.apache.commons.codec.language.DaitchMokotoffSoundex
Method: private String cleanup(String input)
h1. 
Problem

The private method "private String cleanup(final String input)” in 
DaitchMokotoffSoundex is intended to sanitize the input string before applying 
the actual phonetic transformation. The implementation does not remove any 
special characters such as !, @, #, or numbers. These characters are preserved 
in the cleaned string, which can lead to incorrect or unexpected phonetic 
results.

 
h1. Test Code

package org.apache.commons.codec.language;
import org.apache.commons.codec.language.DaitchMokotoffSoundex;
import org.junit.Test;
import java.lang.reflect.Method;
import static org.junit.Assert.assertEquals;

public class language_DaitchMokotoffSoundex_cleanup_Test {

    @Test(timeout = 4000)
    public void testCleanup() {
        try {
            // Instantiate the class
            DaitchMokotoffSoundex soundex = new DaitchMokotoffSoundex();

            // Access the private method using reflection
            Method cleanupMethod = 
DaitchMokotoffSoundex.class.getDeclaredMethod("cleanup", String.class);
            cleanupMethod.setAccessible(true);

            // Test input with whitespace
            String input = "  Hello World  ";
            String expectedOutput = "helloworld";
            String actualOutput = (String) cleanupMethod.invoke(soundex, input);
            assertEquals(expectedOutput, actualOutput);


            // Test input with special characters
            input = "Te$t!@#";
            expectedOutput = "test";
            actualOutput = (String) cleanupMethod.invoke(soundex, input);
            assertEquals(expectedOutput, actualOutput);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
h1. }


Expected Result


All non-letter characters (e.g., !, @, #, digits) should be removed as part of 
the cleanup process to ensure reliable phonetic encoding.
h1. 
Actual Result

 

Special characters are preserved. For example "Te$t!@#" -> "te$t!@#"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to