I added -N. To make search work. 

— DM Smith
From my phone. Brief. Weird autocorrections. 

> On Jan 6, 2018, at 4:41 PM, David Haslam <dfh...@protonmail.com> wrote:
> 
> Thanks DM.
> 
> Interesting observations.
> 
> It prompts the question whether either engine includes the capability to 
> normalize the search index (assuming that it does normalize the search key).
> And that it does this by default ????
> Or does indexing assume that all modules were made without using the -N 
> option and are therefore already in NFC.
> Yet it also remains the case that some front-ends also provide for 
> non-indexed search options.
> 
> Moreover, it raise questions as to how the front-end actually displays the 
> set of search results when all or part of the underlying module is not NFC.
> 
> It must be the case that the developers of osis2mod had a valid reason to 
> provide the -N option.
> Are those involved back then still with CrossWire?
> 
> Best regards,
> 
> David
> 
> 
> Sent from ProtonMail Mobile
> 
> 
>> On Sat, Jan 6, 2018 at 21:20, DM Smith <dmsm...@crosswire.org> wrote:
>> The purpose of normalization was for the sake of search. Only when the 
>> search index and the search request are normalized to the same form can a 
>> result be found.
>> 
>> It doesn’t matter if the normalized form is not readable. If SWORD (or 
>> JSword) normalizes both the same, then it doesn’t matter what Unicode 
>> Normalization or lack of it is used for displaying the text. 
>> 
>> Assuming that SWORD (or JSword) handles search properly, the only advantage 
>> of canonical over decomposed in the module itself is space.
>> 
>> In Him,
>>      DM
>> 
>>> On Jan 6, 2018, at 2:26 PM, David Haslam <dfh...@protonmail.com> wrote:
>>> 
>>> Good question, Tom.
>>> 
>>> Assuming that the Latin script part of the source text actually required 
>>> normalization to NFC,
>>> and that at least some of the Biblical Hebrew should not be converted to 
>>> NFC,
>>> you'd build the module using the -N switch of osis2mod, after first 
>>> applying a script 
>>> to the source text to ensure that both the requirements were implemented.
>>> 
>>> It would be a very simple task for a bespoke TextPipe filter with a 
>>> restrict filter 
>>> designed to limit the Convert to NFC subfilter to the text that was not 
>>> Hebrew.
>>> 
>>> Ignoring alphabetical presentation forms, all the Hebrew characters are in 
>>> one Unicode block.
>>> A PCRE to exclude the Hebrew would be very simple.
>>> I could almost do it in my sleep after 17 years using TextPipe.
>>> No doubt other programmers could do likewise with Perl or Python, etc.
>>> 
>>> Best regards,
>>> 
>>> David
>>> 
>>> Sent from ProtonMail Mobile
>>> 
>>> 
>>>> On Sat, Jan 6, 2018 at 19:14, Tom Sullivan <i...@beforgiven.info> wrote:
>>>> Y'all: For text, such as in a commentary, which includes both Hebrew and 
>>>> English (or another modern Latin script using language), what do you put 
>>>> for the normalization? Tom Tom Sullivan i...@beforgiven.info FAX: 
>>>> 815-301-2835 --------------------- Great News! God created you, owns you 
>>>> and gave you commands to obey. You have disobeyed God - as your conscience 
>>>> very well attests to you. God's holiness and justice compel Him to punish 
>>>> you in Hell. Jesus Christ became Man, was crucified, buried and rose from 
>>>> the dead as a substitute for all who trust in Him, redeeming them from 
>>>> Hell. If you repent (turn from your sin) and believe (trust) in Jesus 
>>>> Christ, you will go to Heaven. Otherwise you will go to Hell. Warning! 
>>>> Good works are a result, not cause, of saving trust. More info is at 
>>>> www.esig.beforgiven.info Do you believe this? Copy this signature into 
>>>> your email program and use the Internet to spread the Great News every 
>>>> time you email. On 01/06/2018 12:32 PM, David Haslam wrote: > Hi Greg, > > 
>>>> One area where it might turn out to be useful is for the search features > 
>>>> of front-end apps. > It could be important to know that the underlying 
>>>> module text is _not_ > *NFC*. > > That's not to lay down a requirement as 
>>>> to how search features should be > designed, > but at least to provide the 
>>>> information in case it does matter for some > types of search option. > > 
>>>> Like other things in .conf files, a key can also be _educational_. > It 
>>>> may prompt developers and users to ask, /*Why did they do this?*/ > > cf. 
>>>> It was _almost by accident_ that in 2014, I first came across this > 
>>>> aspect of using Unicode for Biblical Hebrew. > /It applies only to texts 
>>>> with _both_ vowel accents and cantillation./ > > Even though it's 
>>>> mentioned in our developers' wiki, it's all too easily > missed by other 
>>>> CrossWire volunteers. > > Best regards, > > David > > Sent with ProtonMail 
>>>> Secure Email. > >> -------- Original Message -------- >> Subject: Re: 
>>>> [sword-devel] Module .conf files, Unicode Normalization >> Local Time: 6 
>>>> January 2018 5:19 PM >> UTC Time: 6 January 2018 17:19 >> From: 
>>>> greg.helli...@gmail.com >> To: David Haslam , SWORD Developers' >> 
>>>> Collaboration Forum >> >> Why would the front end or engine need to know 
>>>> this information? Would >> it help the front end developers or users to 
>>>> know it? What do we gain >> by adding this? (I'm not implying it wouldn't 
>>>> be beneficial. But the >> only thing I know about Unicode is how the 
>>>> different UTF encodings >> work, so I have no idea what use this 
>>>> information could be. I also >> think changes to formats and information 
>>>> standards should be >> conservative instead of liberal) >> >> --Greg >> >> 
>>>> On Jan 6, 2018 11:01, "David Haslam" > > wrote: >> >> Dear all, >> >> 
>>>> We've known for quite a few years that there are aspects of >> *Biblical 
>>>> Hebrew* that mean we should _avoid_ converting the >> Unicode source text 
>>>> to *NFC* when we build a module. >> >> This prompts me to suggest that we 
>>>> ought to define a new *key* for >> .conf files. >> >> *Normalization=NFC* 
>>>> (this would be the default, and may be >> _omitted_ for the vast majority 
>>>> of modules) >> *Normalization=Custom* (we should include this in certain 
>>>> Biblical >> Hebrew modules) >> >> This would make it clear to front-end 
>>>> developers and users alike >> that the source text was _not_ converted to 
>>>> NFC during module build. >> i.e. *osis2mod* was used intentionally with 
>>>> the *-N* switch, in >> _accordance with the requirements of the source 
>>>> text provider_. >> >> The Unicode source text may already be encoded in 
>>>> *UTF-8* ; this >> memo is /only /about normalization. >> >> In the rare 
>>>> eventuality that there could arise a requrement for >> any of the other 
>>>> three normalization forms (*NFD*, *NFKC*, *NFKD*) >> defined by the 
>>>> Unicode Consortium, >> these would also be permitted values for the conf 
>>>> file key. >> >> A further benefit arises when a module needs to be 
>>>> updated. >> If the modules team sees that the .conf file includes the line 
>>>> >> *Normalization=Custom* >> they would be forewarned against converting 
>>>> to NFC through >> /inadvertently/ omitting the *-N* switch during module 
>>>> build. >> >> _Aside_: Another language with a need for non-standard >> 
>>>> normalization is *Tibetan*. We don't yet have a module in that script. >> 
>>>> >> Best regards, >> >> David >> >> Sent with ProtonMail Secure Email. >> 
>>>> >> >> _______________________________________________ >> sword-devel 
>>>> mailing list: sword-devel@crosswire.org >> >> 
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel >> >> Instructions 
>>>> to unsubscribe/change your settings at above page > > > 
>>>> ______________________________________________________________________ > 
>>>> This email has been scanned by the Symantec Email Security.cloud service. 
>>>> > For more information please visit http://www.symanteccloud.com > 
>>>> ______________________________________________________________________ > > 
>>>> > _______________________________________________ > sword-devel mailing 
>>>> list: sword-devel@crosswire.org > 
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to 
>>>> unsubscribe/change your settings at above page > 
>>>> _______________________________________________ sword-devel mailing list: 
>>>> sword-devel@crosswire.org 
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to 
>>>> unsubscribe/change your settings at above page
>>> _______________________________________________ 
>>> sword-devel mailing list: sword-devel@crosswire.org 
>>> http://www.crosswire.org/mailman/listinfo/sword-devel 
>>> Instructions to unsubscribe/change your settings at above page
>> 
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to