Dear all,
We've known for quite a few years that there are aspects of Biblical Hebrew
that mean we should avoid converting the Unicode source text to NFC when we
build a module.
This prompts me to suggest that we ought to define a new key for .conf files.
Normalization=NFC (this would be the default, and may be omitted for the vast
majority of modules)
Normalization=Custom (we should include this in certain Biblical Hebrew modules)
This would make it clear to front-end developers and users alike that the
source text was not converted to NFC during module build.
i.e. osis2mod was used intentionally with the -N switch, in accordance with the
requirements of the source text provider.
The Unicode source text may already be encoded in UTF-8 ; this memo is only
about normalization.
In the rare eventuality that there could arise a requrement for any of the
other three normalization forms (NFD, NFKC, NFKD) defined by the Unicode
Consortium,
these would also be permitted values for the conf file key.
A further benefit arises when a module needs to be updated.
If the modules team sees that the .conf file includes the line
Normalization=Custom
they would be forewarned against converting to NFC through inadvertently
omitting the -N switch during module build.
Aside: Another language with a need for non-standard normalization is Tibetan.
We don't yet have a module in that script.
Best regards,
David
Sent with [ProtonMail](https://protonmail.com) Secure Email.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page