Can someone offer some pointers as to what I am doing wrong? I am trying to add the ability to osis2mod to optionally ensure that a UTF-8 document is normalized to NFC.
I added -n as a flag to indicate that normalization should occur and set a global boolean variable "normalize" to true iff the flag is present. Rather than reinventing the wheel, I figured Sword's UTF8NFC filter would be the ticket. First I added the header with: #ifdef _ICU_ #include <utf8nfc.h> #endif And I created a global variable: #ifdef _ICU_ UTF8NFC normalizer; #endif Then right before adding the entry I ran it through the filter: #ifdef _ICU_ if (normalize) { normalizer.processText(activeVerseText, (SWKey *)2); // note the hack of 2 to mimic a real key. TODO: remove all hacks } #endif Now I ran the KJV.xml at www.crosswire.org/~dmsmith/kjv2006 through osis2mod. Since I thought I had already normalized the text, I expected a diff to show nothing. However I found corruption in Matthew 3:17 at the end of the raw text in the module. (and many places later.) The corruption is always at the end of the line. Here is the raw text for that verse: <w lemma="strong:G3588" morph="robinson:T-NSM" src="13"></w><w lemma="strong:G2532" morph="robinson:CONJ" src="1">And</w> <w lemma="strong:G2400" morph="robinson:V-2AAM-2S" src="2">lo</w> <w lemma="strong:G5456" morph="robinson:N-NSF" src="3">a voice</w> <w lemma="strong:G1537" morph="robinson:PREP" src="4">from</w> <w lemma="strong:G3588 strong:G3772" morph="robinson:T-GPM robinson:N- GPM" src="5 6">heaven</w>, <w lemma="strong:G3004" morph="robinson:V- PAP-NSF" src="7">saying</w>, <w lemma="strong:G3778" morph="robinson:D- NSM" src="8">This</w> <w lemma="strong:G2076" morph="robinson:V- PXI-3S" src="9">is</w> <w lemma="strong:G3450" morph="robinson:P-1GS" src="12">my</w> <w lemma="strong:G27" morph="robinson:A-NSM" src="14">beloved</w> <w lemma="strong:G3588 strong:G5207" morph="robinson:T-NSM robinson:N-NSM" src="10 11">Son</w>, <w lemma="strong:G1722" morph="robinson:PREP" src="15">in</w> <w lemma="strong:G3739" morph="robinson:R-DSM" src="16">whom</w> <w lemma="strong:G2106" morph="robinson:V-AAI-1S" src="17">I am well pleased</w>.<milestone resp="pdy 2003-12-14-08:48" type="x- strongsMarkup"/>="22"꧁ Any help would be appreciated. Thanks! Working together, DM Smith _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page