On 03/07/11 18:43, Greg Hellings wrote: > http://dl.thehellings.com/count.py
What one though really needs (an all solutions mentioned so far lack) is a character counter which disregards OSIS tags and attributes. A "c" in a text of a cyrillic Bible can either be perfectly innocent (as part of e.g. the "chapter" tag) or it might be in place of a "с" (\u0441), in which case it causes a mess. Similar about numbers - a common problem in Arabic script texts we receive is that the references in xrefs are in Western numbers. Again, such numbers are normal part of OSIS attributes I have just now committed a couple of scripts to sword-tools to assist with this: 1) charmap.pl takes a OSIS file (or rather any XML file) and returns a character map similar to thise discussed, but solely for text nodes 2) osis_tr.pl does a "tr" job - replacing one set of characters with another, but again only in text nodes 3) numbers.pl fixes the numbers problem above. I wrote this first, before I generalised it into the osis_tr.pl script, but think it has value, as the problem is so common. Peter _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page