On Jun 25, 2014, at 8:53 AM, David Haslam <dfh...@googlemail.com> wrote:

> My observations arise in connection with Hebrew Unicode text.

Hebrew probably should be NFD. My experiments with NFC did not look good (in 
both SWORD and JSword frontends) with various fonts having Hebrew support.

To have NFD the -N flag should be used. Otherwise it will become NFKC.

> 
> I do know why NFC is default, and why it's recommended.
> 
> The Hebrew MapM module is not NFC normalized, so there must have been a
> genuine reason why the -N option was used during its build. Another Hebrew
> module (from IBT) is also not normalized.
> 
> Likewise, an earlier version of the Hebrew WLC module was rebuilt without
> NFC, albeit the current release is normalized. Refer to the file wlc.conf
> for the history.
> 
> This suggests that the -N option can be made to work, but perhaps it has
> only ever been tested under Linux?  As a Windows user, I am curious as to
> why I could not get it to work at all. 

If SWORD is built with ICU then it should work. If it is not then it is the 
responsibility of the user to ensure that the text is properly encoded in UTF-8.

> 
> Though I can't go into any details, my OSIS XML source text is already
> UTF-8, and is valid to the OSIS schema.

That your text is UTF-8 is good, but it is not necessarily sufficient. I've 
seen a few texts that are UTF-8 but have both multiple representations (NFD, 
NFC, ...). It is really frustrating to figure out why the same word in two 
places looks different. Having osis2mod do normalization is marginally helpful.

It'd be better if all frontends used ICU and did the normalization. To my 
knowledge, none do. Some can't/won't. So, osis2mod to the rescue.

If you know your text is UTF-8 and uniformly in one encoding then use the -N 
flag. Also use it if you know that your text needs to be other than NF(K)C.

> 
> I am still curious as to why there was a historic reversion of normalization
> for the WLC module.

I think that NFC doesn't work for Hebrew.

> cf. I asked Chris, but he never responded, though I guess he's too busy this
> year.
> 
> Best regards,
> 
> David
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://sword-dev.350566.n4.nabble.com/Using-the-N-option-in-osis2mod-tp4653983p4654013.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> 
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to