> -----Original Message-----
> From: Oliver Heger [mailto:oliver.he...@oliver-heger.de]
> Sent: Tuesday, January 25, 2011 15:19
> To: Commons Developers List
> Subject: Re: [codec] Large test data set!
> 
> Am 25.01.2011 21:01, schrieb Gary Gregory:
> > Hi All:
> >
> > I just found a data set that I would like to integrate with [codec] to
> test the language package:
> >
> > http://sourceforge.net/projects/familynamephon/
> >
> > The test data file contains 837K German names (37MB) in a text file and
> encodings in Cham (?) phonetics, Cologne phonetics, Metaphone, and Soundex.
> >
> > I have no idea how long it would take to run a test for our language
> encoders on this but I imagine making it an optional unit test. How do you
> do THAT in Maven?
> >
> > The data is covered (I think, I do not read German) by this license:
> http://www.opendatacommons.org/licenses/odbl/1.0/
> 
> Being a native German speaker I can confirm that the license is actually
> the Open Database License which can be found at the URL you provided.

Can we include the data file in our tests? The PDF describing the file?

Thank you,
Gary

> 
> Cham phonetics seems to be a special algorithm for encoding names. [1]
> contains more background information about it (unfortunately also in
> German). According to this page the name stems from a region in Bavaria.
> You can find a PHP implementation of this algorithm in [2].
> 
> HTH
> Oliver
> 
> [1] http://www.genealogie-konzepte.net/chamer-phonetik
> [2] http://www.genealogie-konzepte.net/chamer-phonetik/implementierung
> 
> >
> > Thoughts?
> > Gary Gregory
> > Senior Software Engineer
> > Rocket Software
> > 3340 Peachtree Road, Suite 820 * Atlanta, GA 30326 * USA
> > Tel: +1.404.760.1560
> > Email: ggreg...@seagullsoftware.com<mailto:ggreg...@seagullsoftware.com>
> > Web: seagull.rocketsoftware.com<http://www.seagull.rocketsoftware.com/>
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to