On Jul 2, 2012, at 9:19 PM, Chris Little wrote:

> On 7/2/2012 5:47 AM, Greg Hellings wrote:
>> Is there an available (and proper-name-tagged!) version of the Bible
>> in a sister tongue to Cherokee that we could use as the basis for
>> comparisons? "David" -> "dewi" seems a pretty distant comparison that
>> is far more likely to yield issues than if we have a sister tongue
>> where "dawi" or what have you is already marked as a proper name.
>> Having such a related language would greatly enhance the accuracy of
>> this portion of the work.
> 
> The naive, orthographic edit distance between 'david' and 'dewi' is 3-5. (5 
> if substituions cost 2, 3 if they cost 1.)
> 
> With metaphone (a modern soundex-type algorithm) that just assumes the 
> Cherokee is English, 'david' and 'dewi' become 'tft' and 'tw' respectively, 
> with an edit distance of 2-3.
> 
> Knowing some things about Cherokee helps us tune the algorithm for Cherokee. 
> For example, it has no final consonants, so maybe we shouldn't penalize extra 
> final consonants in English as much.
> 
> And we could also go straight from Hebrew/Greek instead of English, since it 
> appears the Cherokee is transliterating names from Hebrew/Greek, not English. 
> David from Hebrew would be transliterated 'dawid' and its 
> metaphone-equivalent would be 'twt'. That's got an edit distance of 1 from 
> the Cherokee 'tw'. If we discount the cost of a difference in final 
> consonants, the edit distance would be even less (0.5, for example).
> 
> I imagine we could also examine the English and Hebraicize/Hellenize it, as 
> appropriate, to reconstruct a passable metaphone-equivalent of the 
> Hebrew/Greek from English. In the above, the 'v' in 'david' obviously came 
> from waw, so in Hebrew names, 'v' should probably become 'w' in in our 
> modified metaphone algorithm.
> 
> So, all told, we could probably tag names with very high accuracy using names 
> from a text in an unrelated language.

If the names vary very little from one verse to another. That is "dewi" is a 
singular spelling of David, then one can probably take the set of verses that 
have David in them and look for the common words in Cherokee in those same 
verses. That would also narrow the set of words that need to be considered. It 
might narrow to one word.

In Him,
        DM


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to