Thanks DM,

Then we have a serious problem in SWORD that Peter’s initial feedback failed to 
foresee.

Is anything to be done about this ?

Aside: WWJD = What Would JSword Do ?

David

Sent from [Proton Mail](https://proton.me/mail/home) for iOS

On Fri, May 2, 2025 at 15:10, DM Smith <[dmsm...@crosswire.org](mailto:On Fri, 
May 2, 2025 at 15:10, DM Smith <<a href=)> wrote:

> Fast Search could benefit from this, but SWORD uses plain text as an input to 
> Lucene index creation and searching. The implementation of the Lucene 
> analyzer that SWORD uses for all texts uses ASCII space as a word break. The 
> presence of various zero width characters would not help. Nor would using 
> <w>…</w><w>…</w> without an ASCII space.
>
> Plain text, also called strip text, does double duty. Presentation without 
> markup and preparation for indexing. For latinate texts, this works fine. I 
> don’t know if LocalStripFilter could help in this.
>
> In Him,
> DM
>
>> On May 1, 2025, at 7:46 AM, Peter von Kaehne <ref...@gmx.net> wrote:
>>
>> I think this is not difficult per se, but it should be properly encoded.
>>
>> <w> seems correct, using zero with characters seems not correct.
>>
>> Peter
>>
>> Sent from [Outlook for iOS](https://aka.ms/o0ukef)
>> ---------------------------------------------------------------
>>
>> From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of David 
>> Haslam <dfh...@protonmail.com>
>> Sent: Thursday, May 1, 2025 11:30 am
>> To: sword-devel mailing list <sword-devel@crosswire.org>
>> Cc: David Haslam <df.has...@btinternet.com>
>> Subject: [sword-devel] Proposal for a new SWORD filter to display word 
>> dividers
>>
>> I wish to propose that we design in a new SWORD filter.
>>
>> The conf key would be:
>>
>> - GlobalOptionFilter=ShowWordDividers
>>
>> In the writing systems for the various languages of SE Asia ( Thai, Khmer, 
>> Lao, Myanmar) there is [generally] no space between words.
>>
>> In this respect, they are like many European languages before the start of 
>> [silent 
>> reading](https://www.amazon.com/Space-Between-Words-Origins-Medieval/dp/080474016X).
>>  The descriptive term is Scriptura Continua.
>>
>> Some Bible translations for this region are already making use of one of the 
>> ZERO WIDTH characters to invisibly mark the divisions between lexical words.
>> Options include:
>>
>> - U+200B ZERO WIDTH SPACE
>> - U+200C ZERO WIDTH NON-JOINER
>> - U+FEFF ZERO WIDTH NO BREAK SPACE
>>
>> They exclude:
>>
>> - U+200D ZERO WIDTH JOINER
>>
>> A further possibility, even without requiring a full study Bible with 
>> Strong's, etc, is to simply wrap each lexical word within the OSIS w element.
>> One without any OSIS attributes would suffice for this purpose. Likewise, 
>> for the seg element.
>>
>> My proposal is that we design a feature to show/hide word dividers by 
>> displaying them using a suitable visible but non-intrusive character.
>> My suggestion is to use this Unicode character by default:
>>
>> - U+00B7 MIDDLE DOT
>>
>> We could even allow the actual visible character to be specified in a second 
>> conf key, thus:
>>
>> - VisibleWordDivider=U+00B7
>>
>> Benefits would include:
>>
>> - Helps with language learning to know where lexical words start and end
>> - Helps with front-end search for whole words, exact phrase or all words
>> - Helps with checking the accuracy of Bible translations by clearly 
>> displaying lexical word boundaries at the touch of a single key in the 
>> front-end
>> - Paves the way for Study Bible with the addition of Strong's mark-up, etc.
>>
>> Here's a sample of Khmer verse text with the MIDDLE DOT as the visible word 
>> divider:
>>
>>> Obad.1.1
>>> នេះ·ជា·សុបិន·និមិត្ដ·របស់·លោក·អូបាឌា 
>>> ព្រះអម្ចាស់·ជា·ព្រះ·មាន·បន្ទូល·ពី·ក្រុង·អេដំម ។ 
>>> យើង·បាន·ឮ·ដំណឹង·មក·ពី·ព្រះអម្ចាស់ គឺ·មាន·ទូត·ម្នាក់·បាន·បញ្ជូន·ឲ្យ·ទៅ 
>>> ក្នុង·ចំណោម·ជន·ជាតិ·ទាំង·ឡាយ·ដោយ·ពាក្យ·ថា "ចូរ·ក្រោក·ឡើង ! 
>>> ចូរ·យើង·ក្រោក·ឡើង·ធ្វើ·ចម្បាំង·ទាស់·និង·គេ"
>>
>> cf. Here's what it looks like with the ZWSP as the in visible word divider:
>>
>>> Obad.1.1
>>> នេះ​ជា​សុបិន​និមិត្ដ​របស់​លោក​អូបាឌា 
>>> ព្រះអម្ចាស់​ជា​ព្រះ​មាន​បន្ទូល​ពី​ក្រុង​អេដំម ។ 
>>> យើង​បាន​ឮ​ដំណឹង​មក​ពី​ព្រះអម្ចាស់ គឺ​មាន​ទូត​ម្នាក់​បាន​បញ្ជូន​ឲ្យ​ទៅ 
>>> ក្នុង​ចំណោម​ជន​ជាតិ​ទាំង​ឡាយ​ដោយ​ពាក្យ​ថា "ចូរ​ក្រោក​ឡើង ! 
>>> ចូរ​យើង​ក្រោក​ឡើង​ធ្វើ​ចម្បាំង​ទាស់​និង​គេ"
>>
>> If SWORD developers agree that my proposal merits consideration, please 
>> would you start on the software development.
>>
>> Best regards,
>>
>> David
>>
>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to