I added -N. To make search work. — DM Smith From my phone. Brief. Weird autocorrections.
> On Jan 6, 2018, at 4:41 PM, David Haslam <dfh...@protonmail.com> wrote: > > Thanks DM. > > Interesting observations. > > It prompts the question whether either engine includes the capability to > normalize the search index (assuming that it does normalize the search key). > And that it does this by default ???? > Or does indexing assume that all modules were made without using the -N > option and are therefore already in NFC. > Yet it also remains the case that some front-ends also provide for > non-indexed search options. > > Moreover, it raise questions as to how the front-end actually displays the > set of search results when all or part of the underlying module is not NFC. > > It must be the case that the developers of osis2mod had a valid reason to > provide the -N option. > Are those involved back then still with CrossWire? > > Best regards, > > David > > > Sent from ProtonMail Mobile > > >> On Sat, Jan 6, 2018 at 21:20, DM Smith <dmsm...@crosswire.org> wrote: >> The purpose of normalization was for the sake of search. Only when the >> search index and the search request are normalized to the same form can a >> result be found. >> >> It doesn’t matter if the normalized form is not readable. If SWORD (or >> JSword) normalizes both the same, then it doesn’t matter what Unicode >> Normalization or lack of it is used for displaying the text. >> >> Assuming that SWORD (or JSword) handles search properly, the only advantage >> of canonical over decomposed in the module itself is space. >> >> In Him, >> DM >> >>> On Jan 6, 2018, at 2:26 PM, David Haslam <dfh...@protonmail.com> wrote: >>> >>> Good question, Tom. >>> >>> Assuming that the Latin script part of the source text actually required >>> normalization to NFC, >>> and that at least some of the Biblical Hebrew should not be converted to >>> NFC, >>> you'd build the module using the -N switch of osis2mod, after first >>> applying a script >>> to the source text to ensure that both the requirements were implemented. >>> >>> It would be a very simple task for a bespoke TextPipe filter with a >>> restrict filter >>> designed to limit the Convert to NFC subfilter to the text that was not >>> Hebrew. >>> >>> Ignoring alphabetical presentation forms, all the Hebrew characters are in >>> one Unicode block. >>> A PCRE to exclude the Hebrew would be very simple. >>> I could almost do it in my sleep after 17 years using TextPipe. >>> No doubt other programmers could do likewise with Perl or Python, etc. >>> >>> Best regards, >>> >>> David >>> >>> Sent from ProtonMail Mobile >>> >>> >>>> On Sat, Jan 6, 2018 at 19:14, Tom Sullivan <i...@beforgiven.info> wrote: >>>> Y'all: For text, such as in a commentary, which includes both Hebrew and >>>> English (or another modern Latin script using language), what do you put >>>> for the normalization? Tom Tom Sullivan i...@beforgiven.info FAX: >>>> 815-301-2835 --------------------- Great News! God created you, owns you >>>> and gave you commands to obey. You have disobeyed God - as your conscience >>>> very well attests to you. God's holiness and justice compel Him to punish >>>> you in Hell. Jesus Christ became Man, was crucified, buried and rose from >>>> the dead as a substitute for all who trust in Him, redeeming them from >>>> Hell. If you repent (turn from your sin) and believe (trust) in Jesus >>>> Christ, you will go to Heaven. Otherwise you will go to Hell. Warning! >>>> Good works are a result, not cause, of saving trust. More info is at >>>> www.esig.beforgiven.info Do you believe this? Copy this signature into >>>> your email program and use the Internet to spread the Great News every >>>> time you email. On 01/06/2018 12:32 PM, David Haslam wrote: > Hi Greg, > > >>>> One area where it might turn out to be useful is for the search features > >>>> of front-end apps. > It could be important to know that the underlying >>>> module text is _not_ > *NFC*. > > That's not to lay down a requirement as >>>> to how search features should be > designed, > but at least to provide the >>>> information in case it does matter for some > types of search option. > > >>>> Like other things in .conf files, a key can also be _educational_. > It >>>> may prompt developers and users to ask, /*Why did they do this?*/ > > cf. >>>> It was _almost by accident_ that in 2014, I first came across this > >>>> aspect of using Unicode for Biblical Hebrew. > /It applies only to texts >>>> with _both_ vowel accents and cantillation./ > > Even though it's >>>> mentioned in our developers' wiki, it's all too easily > missed by other >>>> CrossWire volunteers. > > Best regards, > > David > > Sent with ProtonMail >>>> Secure Email. > >> -------- Original Message -------- >> Subject: Re: >>>> [sword-devel] Module .conf files, Unicode Normalization >> Local Time: 6 >>>> January 2018 5:19 PM >> UTC Time: 6 January 2018 17:19 >> From: >>>> greg.helli...@gmail.com >> To: David Haslam , SWORD Developers' >> >>>> Collaboration Forum >> >> Why would the front end or engine need to know >>>> this information? Would >> it help the front end developers or users to >>>> know it? What do we gain >> by adding this? (I'm not implying it wouldn't >>>> be beneficial. But the >> only thing I know about Unicode is how the >>>> different UTF encodings >> work, so I have no idea what use this >>>> information could be. I also >> think changes to formats and information >>>> standards should be >> conservative instead of liberal) >> >> --Greg >> >> >>>> On Jan 6, 2018 11:01, "David Haslam" > > wrote: >> >> Dear all, >> >> >>>> We've known for quite a few years that there are aspects of >> *Biblical >>>> Hebrew* that mean we should _avoid_ converting the >> Unicode source text >>>> to *NFC* when we build a module. >> >> This prompts me to suggest that we >>>> ought to define a new *key* for >> .conf files. >> >> *Normalization=NFC* >>>> (this would be the default, and may be >> _omitted_ for the vast majority >>>> of modules) >> *Normalization=Custom* (we should include this in certain >>>> Biblical >> Hebrew modules) >> >> This would make it clear to front-end >>>> developers and users alike >> that the source text was _not_ converted to >>>> NFC during module build. >> i.e. *osis2mod* was used intentionally with >>>> the *-N* switch, in >> _accordance with the requirements of the source >>>> text provider_. >> >> The Unicode source text may already be encoded in >>>> *UTF-8* ; this >> memo is /only /about normalization. >> >> In the rare >>>> eventuality that there could arise a requrement for >> any of the other >>>> three normalization forms (*NFD*, *NFKC*, *NFKD*) >> defined by the >>>> Unicode Consortium, >> these would also be permitted values for the conf >>>> file key. >> >> A further benefit arises when a module needs to be >>>> updated. >> If the modules team sees that the .conf file includes the line >>>> >> *Normalization=Custom* >> they would be forewarned against converting >>>> to NFC through >> /inadvertently/ omitting the *-N* switch during module >>>> build. >> >> _Aside_: Another language with a need for non-standard >> >>>> normalization is *Tibetan*. We don't yet have a module in that script. >> >>>> >> Best regards, >> >> David >> >> Sent with ProtonMail Secure Email. >> >>>> >> >> _______________________________________________ >> sword-devel >>>> mailing list: sword-devel@crosswire.org >> >> >>>> http://www.crosswire.org/mailman/listinfo/sword-devel >> >> Instructions >>>> to unsubscribe/change your settings at above page > > > >>>> ______________________________________________________________________ > >>>> This email has been scanned by the Symantec Email Security.cloud service. >>>> > For more information please visit http://www.symanteccloud.com > >>>> ______________________________________________________________________ > > >>>> > _______________________________________________ > sword-devel mailing >>>> list: sword-devel@crosswire.org > >>>> http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to >>>> unsubscribe/change your settings at above page > >>>> _______________________________________________ sword-devel mailing list: >>>> sword-devel@crosswire.org >>>> http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to >>>> unsubscribe/change your settings at above page >>> _______________________________________________ >>> sword-devel mailing list: sword-devel@crosswire.org >>> http://www.crosswire.org/mailman/listinfo/sword-devel >>> Instructions to unsubscribe/change your settings at above page >> > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page