You mean in the JSword API ?

If so, that a start. Thanks, DM. :)

Does that mean you now support the proposed new config key being accepted and 
documented?

Best regards,

David

Sent from ProtonMail Mobile

On Sat, Jan 6, 2018 at 23:43, DM Smith <dmsm...@crosswire.org> wrote:

> I added -N. To make search work.
>
> — DM Smith
> From my phone. Brief. Weird autocorrections.
>
> On Jan 6, 2018, at 4:41 PM, David Haslam <dfh...@protonmail.com> wrote:
>
>> Thanks DM.
>>
>> Interesting observations.
>>
>> It prompts the question whether either engine includes the capability to 
>> normalize the search index (assuming that it does normalize the search key).
>> And that it does this by default ????
>> Or does indexing assume that all modules were made without using the -N 
>> option and are therefore already in NFC.
>> Yet it also remains the case that some front-ends also provide for 
>> non-indexed search options.
>>
>> Moreover, it raise questions as to how the front-end actually displays the 
>> set of search results when all or part of the underlying module is not NFC.
>>
>> It must be the case that the developers of osis2mod had a valid reason to 
>> provide the -N option.
>> Are those involved back then still with CrossWire?
>>
>> Best regards,
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>> On Sat, Jan 6, 2018 at 21:20, DM Smith <dmsm...@crosswire.org> wrote:
>>
>>> The purpose of normalization was for the sake of search. Only when the 
>>> search index and the search request are normalized to the same form can a 
>>> result be found.
>>>
>>> It doesn’t matter if the normalized form is not readable. If SWORD (or 
>>> JSword) normalizes both the same, then it doesn’t matter what Unicode 
>>> Normalization or lack of it is used for displaying the text.
>>>
>>> Assuming that SWORD (or JSword) handles search properly, the only advantage 
>>> of canonical over decomposed in the module itself is space.
>>>
>>> In Him,
>>> DM
>>>
>>>> On Jan 6, 2018, at 2:26 PM, David Haslam <dfh...@protonmail.com> wrote:
>>>>
>>>> Good question, Tom.
>>>>
>>>> Assuming that the Latin script part of the source text actually required 
>>>> normalization to NFC,
>>>> and that at least some of the Biblical Hebrew should not be converted to 
>>>> NFC,
>>>> you'd build the module using the -N switch of osis2mod, after first 
>>>> applying a script
>>>> to the source text to ensure that both the requirements were implemented.
>>>>
>>>> It would be a very simple task for a bespoke TextPipe filter with a 
>>>> restrict filter
>>>> designed to limit the Convert to NFC subfilter to the text that was not 
>>>> Hebrew.
>>>>
>>>> Ignoring alphabetical presentation forms, all the Hebrew characters are in 
>>>> one Unicode block.
>>>> A PCRE to exclude the Hebrew would be very simple.
>>>> I could almost do it in my sleep after 17 years using TextPipe.
>>>> No doubt other programmers could do likewise with Perl or Python, etc.
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>> Sent from ProtonMail Mobile
>>>>
>>>> On Sat, Jan 6, 2018 at 19:14, Tom Sullivan <i...@beforgiven.info> wrote:
>>>>
>>>>> Y'all: For text, such as in a commentary, which includes both Hebrew and 
>>>>> English (or another modern Latin script using language), what do you put 
>>>>> for the normalization? Tom Tom Sullivan i...@beforgiven.info FAX: 
>>>>> 815-301-2835 --------------------- Great News! God created you, owns you 
>>>>> and gave you commands to obey. You have disobeyed God - as your 
>>>>> conscience very well attests to you. God's holiness and justice compel 
>>>>> Him to punish you in Hell. Jesus Christ became Man, was crucified, buried 
>>>>> and rose from the dead as a substitute for all who trust in Him, 
>>>>> redeeming them from Hell. If you repent (turn from your sin) and believe 
>>>>> (trust) in Jesus Christ, you will go to Heaven. Otherwise you will go to 
>>>>> Hell. Warning! Good works are a result, not cause, of saving trust. More 
>>>>> info is at www.esig.beforgiven.infoDo you believe this? Copy this 
>>>>> signature into your email program and use the Internet to spread the 
>>>>> Great News every time you email. On 01/06/2018 12:32 PM, David Haslam 
>>>>> wrote: > Hi Greg, > > One area where it might turn out to be useful is 
>>>>> for the search features > of front-end apps. > It could be important to 
>>>>> know that the underlying module text is _not_ > *NFC*. > > That's not to 
>>>>> lay down a requirement as to how search features should be > designed, > 
>>>>> but at least to provide the information in case it does matter for some > 
>>>>> types of search option. > > Like other things in .conf files, a key can 
>>>>> also be _educational_. > It may prompt developers and users to ask, /*Why 
>>>>> did they do this?*/ > > cf. It was _almost by accident_ that in 2014, I 
>>>>> first came across this > aspect of using Unicode for Biblical Hebrew. > 
>>>>> /It applies only to texts with _both_ vowel accents and cantillation./ > 
>>>>> > Even though it's mentioned in our developers' wiki, it's all too easily 
>>>>> > missed by other CrossWire volunteers. > > Best regards, > > David > > 
>>>>> Sent with ProtonMail Secure Email. > >> -------- Original Message 
>>>>> -------- >> Subject: Re: [sword-devel] Module .conf files, Unicode 
>>>>> Normalization >> Local Time: 6 January 2018 5:19 PM >> UTC Time: 6 
>>>>> January 2018 17:19 >> From: greg.helli...@gmail.com >> To: David Haslam , 
>>>>> SWORD Developers' >> Collaboration Forum  >> >> Why would the front end 
>>>>> or engine need to know this information? Would >> it help the front end 
>>>>> developers or users to know it? What do we gain >> by adding this? (I'm 
>>>>> not implying it wouldn't be beneficial. But the >> only thing I know 
>>>>> about Unicode is how the different UTF encodings >> work, so I have no 
>>>>> idea what use this information could be. I also >> think changes to 
>>>>> formats and information standards should be >> conservative instead of 
>>>>> liberal) >> >> --Greg >> >> On Jan 6, 2018 11:01, "David Haslam" > > 
>>>>> wrote: >> >> Dear all, >> >> We've known for quite a few years that there 
>>>>> are aspects of >> *Biblical Hebrew* that mean we should _avoid_ 
>>>>> converting the >> Unicode source text to *NFC* when we build a module. >> 
>>>>> >> This prompts me to suggest that we ought to define a new *key* for >> 
>>>>> .conf files. >> >> *Normalization=NFC* (this would be the default, and 
>>>>> may be >> _omitted_ for the vast majority of modules) >> 
>>>>> *Normalization=Custom* (we should include this in certain Biblical >> 
>>>>> Hebrew modules) >> >> This would make it clear to front-end developers 
>>>>> and users alike >> that the source text was _not_ converted to NFC during 
>>>>> module build. >> i.e. *osis2mod* was used intentionally with the *-N* 
>>>>> switch, in >> _accordance with the requirements of the source text 
>>>>> provider_. >> >> The Unicode source text may already be encoded in 
>>>>> *UTF-8* ; this >> memo is /only /about normalization. >> >> In the rare 
>>>>> eventuality that there could arise a requrement for >> any of the other 
>>>>> three normalization forms (*NFD*, *NFKC*, *NFKD*) >> defined by the 
>>>>> Unicode Consortium, >> these would also be permitted values for the conf 
>>>>> file key. >> >> A further benefit arises when a module needs to be 
>>>>> updated. >> If the modules team sees that the .conf file includes the 
>>>>> line >> *Normalization=Custom* >> they would be forewarned against 
>>>>> converting to NFC through >> /inadvertently/ omitting the *-N* switch 
>>>>> during module build. >> >> _Aside_: Another language with a need for 
>>>>> non-standard >> normalization is *Tibetan*. We don't yet have a module in 
>>>>> that script. >> >> Best regards, >> >> David >> >> Sent with ProtonMail 
>>>>> Secure Email. >> >> >> _______________________________________________ >> 
>>>>> sword-devel mailing list: sword-devel@crosswire.org >>  >> 
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel >>  >> Instructions 
>>>>> to unsubscribe/change your settings at above page > > > 
>>>>> ______________________________________________________________________ > 
>>>>> This email has been scanned by the Symantec Email Security.cloud service. 
>>>>> > For more information please visit http://www.symanteccloud.com > 
>>>>> ______________________________________________________________________ > 
>>>>> > > _______________________________________________ > sword-devel mailing 
>>>>> list: sword-devel@crosswire.org> 
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to 
>>>>> unsubscribe/change your settings at above page > 
>>>>> _______________________________________________ sword-devel mailing list: 
>>>>> sword-devel@crosswire.org 
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to 
>>>>> unsubscribe/change your settings at above page @crosswire.org> 
>>>>> @protonmail.com> @protonmail.com> @crosswire.org> @protonmail.com>
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>
>> _______________________________________________
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to