[ https://issues.apache.org/jira/browse/CODEC-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dianshu Liao updated CODEC-330: ------------------------------- Affects Version/s: 1.18.0 > org.apache.commons.codec.language.DaitchMokotoffSoundex.cleanup(String) does > not remove special characters (e.g., punctuation) > ------------------------------------------------------------------------------------------------------------------------------ > > Key: CODEC-330 > URL: https://issues.apache.org/jira/browse/CODEC-330 > Project: Commons Codec > Issue Type: Bug > Affects Versions: 1.18.0 > Environment: JDK 8, MacOS > Reporter: Dianshu Liao > Priority: Major > Attachments: Screenshot 2025-05-19 at 1.01.11 am.png > > > File: org.apache.commons.codec.language.DaitchMokotoffSoundex > Method: private String cleanup(String input) > h1. > Problem > The private method "private String cleanup(final String input)” in > DaitchMokotoffSoundex is intended to sanitize the input string before > applying the actual phonetic transformation. The implementation does not > remove any special characters such as !, @, #, or numbers. These characters > are preserved in the cleaned string, which can lead to incorrect or > unexpected phonetic results. > > h1. Test Code > package org.apache.commons.codec.language; > import org.apache.commons.codec.language.DaitchMokotoffSoundex; > import org.junit.Test; > import java.lang.reflect.Method; > import static org.junit.Assert.assertEquals; > public class language_DaitchMokotoffSoundex_cleanup_Test { > @Test(timeout = 4000) > public void testCleanup() { > try { > // Instantiate the class > DaitchMokotoffSoundex soundex = new DaitchMokotoffSoundex(); > // Access the private method using reflection > Method cleanupMethod = > DaitchMokotoffSoundex.class.getDeclaredMethod("cleanup", String.class); > cleanupMethod.setAccessible(true); > // Test input with whitespace > String input = " Hello World "; > String expectedOutput = "helloworld"; > String actualOutput = (String) cleanupMethod.invoke(soundex, > input); > assertEquals(expectedOutput, actualOutput); > // Test input with special characters > input = "Te$t!@#"; > expectedOutput = "test"; > actualOutput = (String) cleanupMethod.invoke(soundex, input); > assertEquals(expectedOutput, actualOutput); > } catch (Exception e) { > e.printStackTrace(); > } > } > h1. } > Expected Result > All non-letter characters (e.g., !, @, #, digits) should be removed as part > of the cleanup process to ensure reliable phonetic encoding. > h1. > Actual Result > > Special characters are preserved. For example "Te$t!@#" -> "te$t!@#" -- This message was sent by Atlassian Jira (v8.20.10#820010)