NB. I have cancelled the earlier email because the attachment was too large for
sword-devel.
It had been in the queue for moderator approval.
The eXperimental module KhmerNTx.zip may now be downloaded from this
[link](https://app.box.com/s/e613wf1qdxbjmvux9gbb6vmes33d2rol) on my box.net
account.
Please see below for the significant details.
Best regards,
David
Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
------- Forwarded Message -------
From: David Haslam <dfh...@protonmail.com>
Date: On Thursday, May 29th, 2025 at 9:26 AM
Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word
Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
To: sword-devel mailing list <sword-devel@crosswire.org>
CC: steve.anti...@gmail.com <steve.anti...@gmail.com>, Modules Issues
<modu...@crosswire.org>
> Dear SWORD Developers (and our Modules Team),
>
> While watching the [livestream
> funeral](https://www.youtube.com/live/zC4hXOgqBak?si=JZ7JiM7j_fHW-sQl) of OT
> Scholar the late Gordon D Wenham yesterday (St Mary's Church, Charlton
> Kings), I had a bright idea.
>
> I'd been working recently on potential improvements for the KhmerNT module
> relating to marking the Lexical Word Divisions.
> Khmer is one of the languages of SE Asia whose Writing System (aka Script)
> largely has NO SPACE BETWEEN WORDS.
> Others include: Lao, Thai, Myanmar (aka Burmese), together with other
> languages in the region that employ one of these scripts (e.g. Isaan).
>
> Until the present, the KhmerNT module makes use of the ZWSP = Zero Width
> Space to mark lexical word boundaries.
> This helps with SWORD search for whole words, because even though the
> divisions between words are invisible to human eyes, they are accessible to
> computer software.
>
> Wouldn't it be nice if ... (cue to sing the melody by the Beach Boys) πΆ
>
> - We could instead use a visible Unicode character
> - That character could be hidden by means of an existing SWORD filter
>
> There is such a character!!!
>
> - U+2019 is one of the codepoints hidden (or changed) by the filter
> UTF8GreekAccents.
>
>> U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of
>> the NT Greek as the apostrophe, not as a quotation mark.
>>
>> In NT Greek, it appears in:
>>
>> - Elisions: When a vowel at the end of a word is dropped (e.g., διβ instead
>> of διά before a vowel).
>> - Contractions or abbreviations: e.g., αΌΟβ for αΌΟΞ―, ΞΊΞ±ΞΈβ for ΞΊΞ±ΟΞ¬.
>> While U+2019 is typographically correct for apostrophes in modern
>> typesetting, some older or simpler digital texts may use U+0027 (straight
>> apostrophe). However, U+2019 is the preferred character in high-quality,
>> properly typeset Greek texts.
>
> I then set about to test my idea by making a further update to an already
> eXperimental version of the module, provisionally named KhmerNTx.
>
> It "worked like a dream". π
>
> With Greek accents hidden, the text looks like this:
>
>> αααα»ααααααα»α ααΆααΆαααααααααααααααΌαααα·ααα
>> ααΌαα
αααααα½αα’ααααααααααααΆααα
αΆααααΆαααααΎαααΎα
>> α αΎααααααΆαααααααααααααΆαα
ααααΆαααα
ααααααα’αΆαααααα
αααα»αααα»ααα»α αααα»αααΆα‘αΆααΈ
>> αααα»αααΆαααΆααΌααΆ αααα»αα’αΆαααΈ αα·ααααα»ααααΈααΌααΆ (I Peter 1:1 [KhmerNTx])
>
> With Greek accents displayed, the text looks like this:
>
>> αααα»αβαααααα»α ααΆβααΆααβααααβααααβαααααΌβαααα·ααα
>> ααΌαβα
ααααβαα½αα’αααβαααβααααααΆααα
αΆααβααΆαβααααΎαααΎα
>> α αΎαβαααβααΆαβααααααααβααααΆβαα
βααααΆααβαα
βααααααα’αΆααααβαα
βαααα»αβααα»ααα»α
>> αααα»αβααΆα‘αΆααΈ αααα»αβααΆαααΆααΌααΆ αααα»αβα’αΆαααΈ αα·αβαααα»αβαααΈααΌααΆ (I Peter 1:1
>> [KhmerNTx])
>
> I have attached the compressed module for any of you to explore & play with
> further.
>
> Aside: The previous update already made use of the OSIS XML w element to
> enclose each lexical Khmer word. That remains the case.
> In this way, the module source text is ready to be adaptedforfurther
> enhancements such as adding Strong's numbers, etc, to make a Study Edition.
>
> Steve Hyde and the translators in Cambodia are currently preparing to publish
> the complete Khmer Bible.
> He has requested my assistance in improving the actual word divisions for the
> 39 OT books.
> I've already been sent the source text, exported from their database.
>
> Since early May, I have been exploring how the Grok AI engine can make a
> positive contribution to the success of this challenging task.
> More on that subject later.
>
> Best regards,
>
> David
>
> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page