Am 26.01.2011 02:36, schrieb Gary Gregory:
-----Original Message-----
From: Oliver Heger [mailto:oliver.he...@oliver-heger.de]
Sent: Tuesday, January 25, 2011 15:19
To: Commons Developers List
Subject: Re: [codec] Large test data set!

Am 25.01.2011 21:01, schrieb Gary Gregory:
Hi All:

I just found a data set that I would like to integrate with [codec] to
test the language package:

http://sourceforge.net/projects/familynamephon/

The test data file contains 837K German names (37MB) in a text file and
encodings in Cham (?) phonetics, Cologne phonetics, Metaphone, and Soundex.

I have no idea how long it would take to run a test for our language
encoders on this but I imagine making it an optional unit test. How do you
do THAT in Maven?

The data is covered (I think, I do not read German) by this license:
http://www.opendatacommons.org/licenses/odbl/1.0/

Being a native German speaker I can confirm that the license is actually
the Open Database License which can be found at the URL you provided.

Can we include the data file in our tests? The PDF describing the file?

Thank you,
Gary

Well, IANAL.

But if I understand the license correctly, according to paragraph 3 we should be allowed to use the data as part of our tests and distribute it. We have to adhere to the usage conditions defined in paragraph 4, so we would have to add a note to our NOTICE.txt.

However, it will probably do no harm to ask at legal@.

Oliver



Cham phonetics seems to be a special algorithm for encoding names. [1]
contains more background information about it (unfortunately also in
German). According to this page the name stems from a region in Bavaria.
You can find a PHP implementation of this algorithm in [2].

HTH
Oliver

[1] http://www.genealogie-konzepte.net/chamer-phonetik
[2] http://www.genealogie-konzepte.net/chamer-phonetik/implementierung


Thoughts?
Gary Gregory
Senior Software Engineer
Rocket Software
3340 Peachtree Road, Suite 820 * Atlanta, GA 30326 * USA
Tel: +1.404.760.1560
Email: ggreg...@seagullsoftware.com<mailto:ggreg...@seagullsoftware.com>
Web: seagull.rocketsoftware.com<http://www.seagull.rocketsoftware.com/>





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to