Hi All:

I just found a data set that I would like to integrate with [codec] to test the 
language package:

http://sourceforge.net/projects/familynamephon/

The test data file contains 837K German names (37MB) in a text file and 
encodings in Cham (?) phonetics, Cologne phonetics, Metaphone, and Soundex.

I have no idea how long it would take to run a test for our language encoders 
on this but I imagine making it an optional unit test. How do you do THAT in 
Maven?

The data is covered (I think, I do not read German) by this license: 
http://www.opendatacommons.org/licenses/odbl/1.0/

Thoughts?
Gary Gregory
Senior Software Engineer
Rocket Software
3340 Peachtree Road, Suite 820 * Atlanta, GA 30326 * USA
Tel: +1.404.760.1560
Email: ggreg...@seagullsoftware.com<mailto:ggreg...@seagullsoftware.com>
Web: seagull.rocketsoftware.com<http://www.seagull.rocketsoftware.com/>


Reply via email to