[sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS

David Haslam Thu, 29 May 2025 07:48:03 -0700

NB. I have cancelled the earlier email because the attachment was too large for 
sword-devel.
It had been in the queue for moderator approval.


The eXperimental module KhmerNTx.zip may now be downloaded from this 
[link](https://app.box.com/s/e613wf1qdxbjmvux9gbb6vmes33d2rol) on my box.net 
account.

Please see below for the significant details.

Best regards,

David

Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.

------- Forwarded Message -------
From: David Haslam <dfh...@protonmail.com>
Date: On Thursday, May 29th, 2025 at 9:26 AM
Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word 
Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
To: sword-devel mailing list <sword-devel@crosswire.org>
CC: steve.anti...@gmail.com <steve.anti...@gmail.com>, Modules Issues 
<modu...@crosswire.org>

> Dear SWORD Developers (and our Modules Team),
>
> While watching the [livestream 
> funeral](https://www.youtube.com/live/zC4hXOgqBak?si=JZ7JiM7j_fHW-sQl) of OT 
> Scholar the late Gordon D Wenham yesterday (St Mary's Church, Charlton 
> Kings), I had a bright idea.
>
> I'd been working recently on potential improvements for the KhmerNT module 
> relating to marking the Lexical Word Divisions.
> Khmer is one of the languages of SE Asia whose Writing System (aka Script) 
> largely has NO SPACE BETWEEN WORDS.
> Others include: Lao, Thai, Myanmar (aka Burmese), together with other 
> languages in the region that employ one of these scripts (e.g. Isaan).
>
> Until the present, the KhmerNT module makes use of the ZWSP = Zero Width 
> Space to mark lexical word boundaries.
> This helps with SWORD search for whole words, because even though the 
> divisions between words are invisible to human eyes, they are accessible to 
> computer software.
>
> Wouldn't it be nice if ... (cue to sing the melody by the Beach Boys) 🎶
>
> - We could instead use a visible Unicode character
> - That character could be hidden by means of an existing SWORD filter
>
> There is such a character!!!
>
> - U+2019 is one of the codepoints hidden (or changed) by the filter 
> UTF8GreekAccents.
>
>> U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of 
>> the NT Greek as the apostrophe, not as a quotation mark.
>>
>> In NT Greek, it appears in:
>>
>> - Elisions: When a vowel at the end of a word is dropped (e.g., δι’ instead 
>> of διά before a vowel).
>> - Contractions or abbreviations: e.g., ἐπ’ for ἐπί, καθ’ for κατά.
>> While U+2019 is typographically correct for apostrophes in modern 
>> typesetting, some older or simpler digital texts may use U+0027 (straight 
>> apostrophe). However, U+2019 is the preferred character in high-quality, 
>> properly typeset Greek texts.
>
> I then set about to test my idea by making a further update to an already 
> eXperimental version of the module, provisionally named KhmerNTx.
>
> It "worked like a dream". 😎
>
> With Greek accents hidden, the text looks like this:
>
>> ខ្ញុំពេត្រុស ជាសាវករបស់ព្រះយេស៊ូគ្រិស្ដ 
>> ជូនចំពោះពួកអ្នកដែលព្រះជាម្ចាស់បានជ្រើសរើស 
>> ហើយដែលបានបែកខ្ញែកគ្នាទៅស្នាក់នៅបណ្ដោះអាសន្ននៅស្រុកប៉ុនតុស ស្រុកកាឡាទី 
>> ស្រុកកាប៉ាដូគា ស្រុកអាស៊ី និងស្រុកប៉ីធូនា (I Peter 1:1 [KhmerNTx])
>
> With Greek accents displayed, the text looks like this:
>
>> ខ្ញុំ’ពេត្រុស ជា’សាវក’របស់’ព្រះ’យេស៊ូ’គ្រិស្ដ 
>> ជូន’ចំពោះ’ពួកអ្នក’ដែល’ព្រះជាម្ចាស់’បាន’ជ្រើសរើស 
>> ហើយ’ដែល’បាន’បែកខ្ញែក’គ្នា’ទៅ’ស្នាក់’នៅ’បណ្ដោះអាសន្ន’នៅ’ស្រុក’ប៉ុនតុស 
>> ស្រុក’កាឡាទី ស្រុក’កាប៉ាដូគា ស្រុក’អាស៊ី និង’ស្រុក’ប៉ីធូនា (I Peter 1:1 
>> [KhmerNTx])
>
> I have attached the compressed module for any of you to explore & play with 
> further.
>
> Aside: The previous update already made use of the OSIS XML w element to 
> enclose each lexical Khmer word. That remains the case.
> In this way, the module source text is ready to be adaptedforfurther 
> enhancements such as adding Strong's numbers, etc, to make a Study Edition.
>
> Steve Hyde and the translators in Cambodia are currently preparing to publish 
> the complete Khmer Bible.
> He has requested my assistance in improving the actual word divisions for the 
> 39 OT books.
> I've already been sent the source text, exported from their database.
>
> Since early May, I have been exploring how the Grok AI engine can make a 
> positive contribution to the success of this challenging task.
> More on that subject later.
>
> Best regards,
>
> David
>
> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

[sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS

Reply via email to