Oliver Heger wrote: > Am 26.01.2011 02:36, schrieb Gary Gregory: >>> -----Original Message----- >>> From: Oliver Heger [mailto:oliver.he...@oliver-heger.de] >>> Sent: Tuesday, January 25, 2011 15:19 >>> To: Commons Developers List >>> Subject: Re: [codec] Large test data set! >>> >>> Am 25.01.2011 21:01, schrieb Gary Gregory: >>>> Hi All: >>>> >>>> I just found a data set that I would like to integrate with [codec] to >>> test the language package: >>>> >>>> http://sourceforge.net/projects/familynamephon/ >>>> >>>> The test data file contains 837K German names (37MB) in a text file and >>> encodings in Cham (?) phonetics, Cologne phonetics, Metaphone, and >>> Soundex. >>>> >>>> I have no idea how long it would take to run a test for our language >>> encoders on this but I imagine making it an optional unit test. How do >>> you do THAT in Maven? >>>> >>>> The data is covered (I think, I do not read German) by this license: >>> http://www.opendatacommons.org/licenses/odbl/1.0/ >>> >>> Being a native German speaker I can confirm that the license is actually >>> the Open Database License which can be found at the URL you provided. >> >> Can we include the data file in our tests? The PDF describing the file? >> >> Thank you, >> Gary > > Well, IANAL. > > But if I understand the license correctly, according to paragraph 3 we > should be allowed to use the data as part of our tests and distribute > it. We have to adhere to the usage conditions defined in paragraph 4, so > we would have to add a note to our NOTICE.txt. > > However, it will probably do no harm to ask at legal@.
Do we actually have to distribute it? Maybe we can add is as zip to the Maven repo and use the dependency plugin to download and extract it on the fly. - Jörg --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org