Oliver Heger wrote:

> Am 26.01.2011 02:36, schrieb Gary Gregory:
>>> -----Original Message-----
>>> From: Oliver Heger [mailto:oliver.he...@oliver-heger.de]
>>> Sent: Tuesday, January 25, 2011 15:19
>>> To: Commons Developers List
>>> Subject: Re: [codec] Large test data set!
>>>
>>> Am 25.01.2011 21:01, schrieb Gary Gregory:
>>>> Hi All:
>>>>
>>>> I just found a data set that I would like to integrate with [codec] to
>>> test the language package:
>>>>
>>>> http://sourceforge.net/projects/familynamephon/
>>>>
>>>> The test data file contains 837K German names (37MB) in a text file and
>>> encodings in Cham (?) phonetics, Cologne phonetics, Metaphone, and
>>> Soundex.
>>>>
>>>> I have no idea how long it would take to run a test for our language
>>> encoders on this but I imagine making it an optional unit test. How do
>>> you do THAT in Maven?
>>>>
>>>> The data is covered (I think, I do not read German) by this license:
>>> http://www.opendatacommons.org/licenses/odbl/1.0/
>>>
>>> Being a native German speaker I can confirm that the license is actually
>>> the Open Database License which can be found at the URL you provided.
>>
>> Can we include the data file in our tests? The PDF describing the file?
>>
>> Thank you,
>> Gary
> 
> Well, IANAL.
> 
> But if I understand the license correctly, according to paragraph 3 we
> should be allowed to use the data as part of our tests and distribute
> it. We have to adhere to the usage conditions defined in paragraph 4, so
> we would have to add a note to our NOTICE.txt.
> 
> However, it will probably do no harm to ask at legal@.

Do we actually have to distribute it? Maybe we can add is as zip to the 
Maven repo and use the dependency plugin to download and extract it on the 
fly.

- Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to