Dear all,

We've known for quite a few years that there are aspects of Biblical Hebrew 
that mean we should avoid converting the Unicode source text to NFC when we 
build a module.

This prompts me to suggest that we ought to define a new key for .conf files.

Normalization=NFC (this would be the default, and may be omitted for the vast 
majority of modules)
Normalization=Custom (we should include this in certain Biblical Hebrew modules)

This would make it clear to front-end developers and users alike that the 
source text was not converted to NFC during module build.
i.e. osis2mod was used intentionally with the -N switch, in accordance with the 
requirements of the source text provider.

The Unicode source text may already be encoded in UTF-8 ; this memo is only 
about normalization.

In the rare eventuality that there could arise a requrement for any of the 
other three normalization forms (NFD, NFKC, NFKD) defined by the Unicode 
Consortium,
these would also be permitted values for the conf file key.

A further benefit arises when a module needs to be updated.
If the modules team sees that the .conf file includes the line
Normalization=Custom
they would be forewarned against converting to NFC through inadvertently 
omitting the -N switch during module build.

Aside: Another language with a need for non-standard normalization is Tibetan. 
We don't yet have a module in that script.

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to