So I said to myself "What a beautiful world!" "Walk before you run, David!"
My walking stage was to replace all the ZWSP by <milestone marker=""
type="x-lexical-word-divider" subtype="x-ZWSP"/>
My running stage would've been to replace all the ZWSP by this [sans
bullets/EOLs]: (where the first marker is a ZWSP)
- <seg type="x-variant" subType="x-1"><milestone marker=""
type="x-lexical-word-divider" subtype="x-ZWSP"/></seg>
- <seg type="x-variant" subType="x-2"><milestone marker="·"
type="x-lexical-word-divider" subtype="x-MDOT"/></seg>
This complex kludge was to emulate the proposed new filter by using
GlobalOptionFilter=OSISVariants
I suppose the concept might simplified as follows, but this would be less
self-documenting:
- <seg type="x-variant" subType="x-1" marker=""/><seg type="x-variant"
subType="x-2" marker="·"/>
This assumes that the marker attribute is valid for use in the seg element. Is
that so?
Best regards,
David
Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
On Friday, May 2nd, 2025 at 4:25 PM, Peter von Kaehne <ref...@gmx.net> wrote:
> Then, if a module offers the option striptext may need to introduce a space
> prior indexing.
>
> Sent from [Outlook for iOS](https://aka.ms/o0ukef)
> ---------------------------------------------------------------
>
> From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of David
> Haslam <dfh...@protonmail.com>
> Sent: Friday, May 2, 2025 4:21 pm
> To: sword-devel mailing list <sword-devel@crosswire.org>
> Cc: David Haslam <df.has...@btinternet.com>
> Subject: Re: [sword-devel] Proposal for a new SWORD filter to display word
> dividers
>
> Thanks DM,
>
> Then we have a serious problem in SWORD that Peter’s initial feedback failed
> to foresee.
>
> Is anything to be done about this ?
>
> Aside: WWJD = What Would JSword Do ?
>
> David
>
> Sent from [Proton Mail](https://proton.me/mail/home) for iOS
>
> On Fri, May 2, 2025 at 15:10, DM Smith <[dmsm...@crosswire.org](mailto:On
> Fri, May 2, 2025 at 15:10, DM Smith <<a href=)> wrote:
>
>> Fast Search could benefit from this, but SWORD uses plain text as an input
>> to Lucene index creation and searching. The implementation of the Lucene
>> analyzer that SWORD uses for all texts uses ASCII space as a word break. The
>> presence of various zero width characters would not help. Nor would using
>> <w>…</w><w>…</w> without an ASCII space.
>>
>> Plain text, also called strip text, does double duty. Presentation without
>> markup and preparation for indexing. For latinate texts, this works fine. I
>> don’t know if LocalStripFilter could help in this.
>>
>> In Him,
>> DM
>>
>>> On May 1, 2025, at 7:46 AM, Peter von Kaehne <ref...@gmx.net> wrote:
>>>
>>> I think this is not difficult per se, but it should be properly encoded.
>>>
>>> <w> seems correct, using zero with characters seems not correct.
>>>
>>> Peter
>>>
>>> Sent from [Outlook for iOS](https://aka.ms/o0ukef)
>>> ---------------------------------------------------------------
>>>
>>> From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of David
>>> Haslam <dfh...@protonmail.com>
>>> Sent: Thursday, May 1, 2025 11:30 am
>>> To: sword-devel mailing list <sword-devel@crosswire.org>
>>> Cc: David Haslam <df.has...@btinternet.com>
>>> Subject: [sword-devel] Proposal for a new SWORD filter to display word
>>> dividers
>>>
>>> I wish to propose that we design in a new SWORD filter.
>>>
>>> The conf key would be:
>>>
>>> - GlobalOptionFilter=ShowWordDividers
>>>
>>> In the writing systems for the various languages of SE Asia ( Thai, Khmer,
>>> Lao, Myanmar) there is [generally] no space between words.
>>>
>>> In this respect, they are like many European languages before the start of
>>> [silent
>>> reading](https://www.amazon.com/Space-Between-Words-Origins-Medieval/dp/080474016X).
>>> The descriptive term is Scriptura Continua.
>>>
>>> Some Bible translations for this region are already making use of one of
>>> the ZERO WIDTH characters to invisibly mark the divisions between lexical
>>> words.
>>> Options include:
>>>
>>> - U+200B ZERO WIDTH SPACE
>>> - U+200C ZERO WIDTH NON-JOINER
>>> - U+FEFF ZERO WIDTH NO BREAK SPACE
>>>
>>> They exclude:
>>>
>>> - U+200D ZERO WIDTH JOINER
>>>
>>> A further possibility, even without requiring a full study Bible with
>>> Strong's, etc, is to simply wrap each lexical word within the OSIS w
>>> element.
>>> One without any OSIS attributes would suffice for this purpose. Likewise,
>>> for the seg element.
>>>
>>> My proposal is that we design a feature to show/hide word dividers by
>>> displaying them using a suitable visible but non-intrusive character.
>>> My suggestion is to use this Unicode character by default:
>>>
>>> - U+00B7 MIDDLE DOT
>>>
>>> We could even allow the actual visible character to be specified in a
>>> second conf key, thus:
>>>
>>> - VisibleWordDivider=U+00B7
>>>
>>> Benefits would include:
>>>
>>> - Helps with language learning to know where lexical words start and end
>>> - Helps with front-end search for whole words, exact phrase or all words
>>> - Helps with checking the accuracy of Bible translations by clearly
>>> displaying lexical word boundaries at the touch of a single key in the
>>> front-end
>>> - Paves the way for Study Bible with the addition of Strong's mark-up, etc.
>>>
>>> Here's a sample of Khmer verse text with the MIDDLE DOT as the visible word
>>> divider:
>>>
>>>> Obad.1.1
>>>> នេះ·ជា·សុបិន·និមិត្ដ·របស់·លោក·អូបាឌា
>>>> ព្រះអម្ចាស់·ជា·ព្រះ·មាន·បន្ទូល·ពី·ក្រុង·អេដំម ។
>>>> យើង·បាន·ឮ·ដំណឹង·មក·ពី·ព្រះអម្ចាស់ គឺ·មាន·ទូត·ម្នាក់·បាន·បញ្ជូន·ឲ្យ·ទៅ
>>>> ក្នុង·ចំណោម·ជន·ជាតិ·ទាំង·ឡាយ·ដោយ·ពាក្យ·ថា "ចូរ·ក្រោក·ឡើង !
>>>> ចូរ·យើង·ក្រោក·ឡើង·ធ្វើ·ចម្បាំង·ទាស់·និង·គេ"
>>>
>>> cf. Here's what it looks like with the ZWSP as the in visible word divider:
>>>
>>>> Obad.1.1
>>>> នេះជាសុបិននិមិត្ដរបស់លោកអូបាឌា
>>>> ព្រះអម្ចាស់ជាព្រះមានបន្ទូលពីក្រុងអេដំម ។
>>>> យើងបានឮដំណឹងមកពីព្រះអម្ចាស់ គឺមានទូតម្នាក់បានបញ្ជូនឲ្យទៅ
>>>> ក្នុងចំណោមជនជាតិទាំងឡាយដោយពាក្យថា "ចូរក្រោកឡើង !
>>>> ចូរយើងក្រោកឡើងធ្វើចម្បាំងទាស់និងគេ"
>>>
>>> If SWORD developers agree that my proposal merits consideration, please
>>> would you start on the software development.
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel@crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page