I think this has been discussed well.
- this should be done on a semantic level and not with a kludge and a hack.
- the obvious semantic solution is to frame words in w tags and then use CSS/trigger and option/whatever agreed from there.
Sent from Outlook for iOS
From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of David Haslam <dfh...@protonmail.com>
Sent: Thursday, May 29, 2025 3:47 pm
To: sword-devel mailing list <sword-devel@crosswire.org>
Cc: Modules Issues <modu...@crosswire.org>; steve.anti...@gmail.com <steve.anti...@gmail.com>
Subject: [sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
Sent: Thursday, May 29, 2025 3:47 pm
To: sword-devel mailing list <sword-devel@crosswire.org>
Cc: Modules Issues <modu...@crosswire.org>; steve.anti...@gmail.com <steve.anti...@gmail.com>
Subject: [sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
NB. I have cancelled the earlier email because the attachment was too large for sword-devel.
It had been in the queue for moderator approval.
The eXperimental module KhmerNTx.zip may now be downloaded from this link on my box.net account.
It had been in the queue for moderator approval.
The eXperimental module KhmerNTx.zip may now be downloaded from this link on my box.net account.
Please see below for the significant details.
Best regards,
David
David
Sent with Proton Mail secure email.
------- Forwarded Message -------
From: David Haslam <dfh...@protonmail.com>
Date: On Thursday, May 29th, 2025 at 9:26 AM
Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
To: sword-devel mailing list <sword-devel@crosswire.org>
CC: steve.anti...@gmail.com <steve.anti...@gmail.com>, Modules Issues <modu...@crosswire.org>
From: David Haslam <dfh...@protonmail.com>
Date: On Thursday, May 29th, 2025 at 9:26 AM
Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
To: sword-devel mailing list <sword-devel@crosswire.org>
CC: steve.anti...@gmail.com <steve.anti...@gmail.com>, Modules Issues <modu...@crosswire.org>
Dear SWORD Developers (and our Modules Team),
While watching the livestream funeral of OT Scholar the late Gordon D Wenham yesterday (St Mary's Church, Charlton Kings), I had a bright idea.
I'd been working recently on potential improvements for the KhmerNT module relating to marking the Lexical Word Divisions.
Khmer is one of the languages of SE Asia whose Writing System (aka Script) largely has NO SPACE BETWEEN WORDS.
Others include: Lao, Thai, Myanmar (aka Burmese), together with other languages in the region that employ one of these scripts (e.g. Isaan).
Until the present, the KhmerNT module makes use of the ZWSP = Zero Width Space to mark lexical word boundaries.
This helps with SWORD search for whole words, because even though the divisions between words are invisible to human eyes, they are accessible to computer software.
Wouldn't it be nice if ... (cue to sing the melody by the Beach Boys) πΆ
- We could instead use a visible Unicode character
- That character could be hidden by means of an existing SWORD filter
There is such a character!!!
- U+2019 is one of the codepoints hidden (or changed) by the filter UTF8GreekAccents.
U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of the NT Greek as the apostrophe, not as a quotation mark.In NT Greek, it appears in:- Elisions: When a vowel at the end of a word is dropped (e.g., διβ instead of διά before a vowel).- Contractions or abbreviations: e.g., αΌΟβ for αΌΟΞ―, ΞΊΞ±ΞΈβ for ΞΊΞ±ΟΞ¬.While U+2019 is typographically correct for apostrophes in modern typesetting, some older or simpler digital texts may use U+0027 (straight apostrophe). However, U+2019 is the preferred character in high-quality, properly typeset Greek texts.I then set about to test my idea by making a further update to an already eXperimental version of the module, provisionally named KhmerNTx.
It "worked like a dream". πWith Greek accents hidden, the text looks like this:αααα»ααααααα»α ααΆααΆαααααααααααααααΌαααα·ααα ααΌαα αααααα½αα’ααααααααααααΆααα αΆααααΆαααααΎαααΎα α αΎααααααΆαααααααααααααΆαα ααααΆαααα ααααααα’αΆαααααα αααα»αααα»ααα»α αααα»αααΆα‘αΆααΈ αααα»αααΆαααΆααΌααΆ αααα»αα’αΆαααΈ αα·ααααα»ααααΈααΌααΆ (I Peter 1:1 [KhmerNTx])With Greek accents displayed, the text looks like this:
αααα»αβαααααα»α ααΆβααΆααβααααβααααβαααααΌβαααα·ααα ααΌαβα ααααβαα½αα’αααβαααβααααααΆααα αΆααβααΆαβααααΎαααΎα α αΎαβαααβααΆαβααααααααβααααΆβαα βααααΆααβαα βααααααα’αΆααααβαα βαααα»αβααα»ααα»α αααα»αβααΆα‘αΆααΈ αααα»αβααΆαααΆααΌααΆ αααα»αβα’αΆαααΈ αα·αβαααα»αβαααΈααΌααΆ (I Peter 1:1 [KhmerNTx])I have attached the compressed module for any of you to explore & play with further.Aside: The previous update already made use of the OSIS XML w element to enclose each lexical Khmer word. That remains the case.
In this way, the module source text is ready to be adapted for further enhancements such as adding Strong's numbers, etc, to make a Study Edition.Steve Hyde and the translators in Cambodia are currently preparing to publish the complete Khmer Bible.
He has requested my assistance in improving the actual word divisions for the 39 OT books.
I've already been sent the source text, exported from their database.
Since early May, I have been exploring how the Grok AI engine can make a positive contribution to the success of this challenging task.
More on that subject later.Best regards,
DavidSent with Proton Mail secure email.
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page