Thanks DM,

Very informative!

Statistical Restoration Greek New Testament (StatResGNT) is an exception for a 
unique reason. Unlike any other Greek NT module, the project makes use of 
quotation marks for speech!

These were never in any original Greek MSS, so it's a form of reverse 
engineering.
cf. That's also the case with the added Pilcrow symbols ¶ to mark paragraphs.

When a user selects to Hide Greek Accents in the front-end (e.g. Xiphos), the 
level 2 quotations lose their closing marks in the displayed text. This is what 
led me to revisit the topic.

Aside: As a Windows user with 
[BabelPad](https://www.babelstone.co.uk/Software/BabelPad.html) installed, I 
can enter any Unicode character into what I'm writing by looking it up in the 
Character Map tool, so I'd have no difficulty in inserting U+2019 into a search 
string. There are similar useful apps even for mobile platforms nowadays, such 
as UnicodePad Pro for iOS/iPadOS.

Best regards,

David

Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.

On Friday, March 21st, 2025 at 2:53 AM, DM Smith <dmsm...@crosswire.org> wrote:

> I think I’ve finished my impact analysis on SWORD search code. I ran out of 
> time to do more. SWORD has several different search mechanisms. Some are 
> affected by the proposed change. Some are not.
>
> In all searches, the search request is normalized by the application by 
> calling UTF8GreekAccents (and a few others) to remove accents and stripText 
> is called to remove OSIS, GBF, ThML, TEI, …. markup and have simple markup 
> for added words, overline and xref notes. If the search is case insensitive, 
> it is converted to uppercase. The same is done for each piece of text that is 
> pulled out of the module.
>
> SEARCHTYPE_EXTERNAL - Lucene or Xapian
>
>> Both Lucene and Xapian, apply additional normalizations to their search 
>> requests in the same fashion as the text. I didn’t examine Xapian source so 
>> I’m not sure with Xapian. Lucene definitely does not have a problem. The 
>> results are the same with or without U+2019 as it is not a token character. 
>> I expect Xapian is similar.
>
> SEARCHTYPE_ENTRYATTR - entryAttrib (eg. Word//Lemma./G1234/) (Lemma with dot 
> means check components (Lemma.[1-9]) also)
>
>> I didn’t examine this. I’m not familiar with it. I’ll guess that it is no 
>> worse than the next three and might not have an issue.
>
> SEARCHTYPE_PHRASE
>
>> The is a simple substring search.
>
> SEARCHTYPE_REGEX
>
>> The search request is compiled into a regular expression (there are 3 
>> variants depending on what regex libraries are used to build SWORD lib.)
>> There is special code to allow a RE to match across 2 verses.
>
> SEARCHTYPE_MULTIWORD - multiword
>
>> The search request is split on spaces, ‘ ‘, (not whitespace) into needles 
>> (as Troy calls them) and each needle is matched by simple substring in the 
>> haystack.
>> This is an AND search. All needles need to be found for there to be a hit.
>
>> There is special code to allow a request to match across 2 verses.
>
> In these last 3 changing the UTF8GreekAccents to not remove U+2019 will 
> change the user experience for search.
>
> - They will have to include it if it is in the text. This is no different 
> than punctuation. which is not stripped from the text. This is no different 
> from U+0027 which might be used in the text for the same purpose.
> - They will have to either copy/paste the character or figure out how to 
> enter it. This is a non-obvious task. I outlined it in an earlier post.
>
> When it comes to accented/unaccented texts, I thought I’d look for how 
> unaccented texts handle the elision. Mark 2:17 is a good example of αλλα 
> being shortened to αλλ followed by οι.
> I looked at all the Greek (grc) modules at CrossWire that have Mark 2:17.
>
> Here are the unaccented modules:
> Antoniades Patriarchal Edition (1904/1912)
>
>> και ακουσας ο ιησους λεγει αυτοις ου χρειαν εχουσιν οι ισχυοντες ιατρου αλλ 
>> οι κακως εχοντες ουκ ηλθον καλεσαι δικαιους αλλα αμαρτωλους εις μετανοιαν
>
> The New Testament in the Original Greek: Byzantine Textform 2013
>
>> και ακουσας ο ιησους λεγει αυτοις ου χρειαν εχουσιν οι ισχυοντες ιατρου αλλ 
>> οι κακως εχοντες ουκ ηλθον καλεσαι δικαιους αλλα αμαρτωλους εις μετανοιαν
>
> Elzevir Textus Receptus (1624)
>
>> και ακουσας ο ιησους λεγει αυτοις ου χρειαν εχουσιν οι ισχυοντες ιατρου αλλ 
>> οι κακως εχοντες ουκ ηλθον καλεσαι δικαιους αλλα αμαρτωλους εις μετανοιαν
>
> Textus Receptus (1550/1894)
>
>> και ακουσας ο ιησους λεγει αυτοις ου χρειαν εχουσιν οι ισχυοντες ιατρου αλλ 
>> οι κακως εχοντες ουκ ηλθον καλεσαι δικαιους αλλα αμαρτωλους εις μετανοιαν
>
> Westcott and Hort with NA27/UBS4 variants
>
>> και ακουσας ο ιησους λεγει αυτοις [οτι] ου χρειαν εχουσιν οι ισχυοντες 
>> ιατρου αλλ οι κακως εχοντες ουκ ηλθον καλεσαι δικαιους αλλα αμαρτωλους
>
> Family 35
>
>> και ακουσας ο ιησους λεγει αυτοις ου χρειαν εχουσιν οι ισχυοντες ιατρου αλλ 
>> οι κακως εχοντες ουκ ηλθον καλεσαι δικαιους αλλα αμαρτωλους εις μετανοιαν
>
> Here are the accented modules:
> Apostolic Bible Polyglot Greek Text
>
>> και ακούσας ο Ιησούς λέγει αυτοίς ου χρείαν έχουσιν οι ισχύοντες ιατρού αλλ΄ 
>> οι κακώς έχοντες ουκ ήλθον καλέσαι δικαίους αλλά αμαρτωλούς εις μετάνοιαν
>
> Morphologically Parsed Greek New Testament based on the SBLGNT
>
>> καὶ ἀκούσας ὁ Ἰησοῦς λέγει αὐτοῖς ⸀ὅτι Οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες ἰατροῦ 
>> ἀλλ’ οἱ κακῶς ἔχοντες· οὐκ ἦλθον καλέσαι δικαίους ἀλλὰ ⸀ἁμαρτωλούς.
>
> Nestle GNT 1904
>
>> καὶ ἀκούσας ὁ Ἰησοῦς λέγει αὐτοῖς Οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες ἰατροῦ ἀλλ’ 
>> οἱ κακῶς ἔχοντες· οὐκ ἦλθον καλέσαι δικαίους ἀλλὰ ἁμαρτωλούς.
>
> Nestle-Aland, Novum Testamentum Graece, 28th Revised Edition
>
>> καὶ ἀκούσας ὁ Ἰησοῦς λέγει °αὐτοῖς °¹[ὅτι] οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες 
>> ἰατροῦ ἀλλ ᾿ οἱ κακῶς ἔχοντες· οὐκ ἦλθον καλέσαι δικαίους ἀλλ ᾿ ἁμαρτωλούς.
>
> Nestle-Aland, Novum Testamentum Graece, 28th Revised Edition
>
>> καὶ ἀκούσας ὁ Ἰησοῦς λέγει °αὐτοῖς °¹[ὅτι] οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες 
>> ἰατροῦ ἀλλ ᾿ οἱ κακῶς ἔχοντες· οὐκ ἦλθον καλέσαι δικαίους ἀλλ ᾿ ἁμαρτωλούς.
>
> The Greek New Testament: SBL Edition
>
>> καὶ ἀκούσας ὁ Ἰησοῦς λέγει αὐτοῖς ⸀ὅτι Οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες ἰατροῦ 
>> ἀλλʼ οἱ κακῶς ἔχοντες· οὐκ ἦλθον καλέσαι δικαίους ἀλλὰ ⸀ἁμαρτωλούς.
>
> Statistical Restoration Greek New Testament
>
>> Καὶ ἀκούσας, ὁ ˚Ἰησοῦς λέγει αὐτοῖς, “Οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες ἰατροῦ, 
>> ἀλλʼ οἱ κακῶς ἔχοντες. Οὐκ ἦλθον καλέσαι δικαίους, ἀλλὰ ἁμαρτωλούς.”
>
> Tregelles' Greek New Testament
>
>> καὶ ἀκούσας ὁ Ἰησοῦς λέγει αὐτοῖς, Οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες ἰατροῦ, 
>> ἀλλ᾽ οἱ κακῶς ἔχοντες. οὐκ ἦλθον καλέσαι δικαίους, ἀλλὰ ἁμαρτωλούς.
>
> Tischendorf's 8th edition GNT
> καὶ ἀκούσας ὁ Ἰησοῦς λέγει αὐτοῖς· οὐ χρείαν ἔχουσιν οἱ ἰσχύοντες ἰατροῦ ἀλλ’ 
> οἱ κακῶς ἔχοντες· οὐκ ἦλθον καλέσαι δικαίους ἀλλὰ ἁμαρτωλούς.
>
> Note that all the unaccented Greek texts do not have αλλ΄ οι.
>
> My recommendation is that we make no changes:
> 1) If we take the CrossWire unaccented Koine Greek modules as representative 
> of the proper way to handle U+2019, then it should be stripped along with 
> accents.
> 2) It changes current behavior that has not been an issue previously.
> 3) It is difficult for an end user to enter U+2019 and to know that it is a 
> particular 1 of several glyphs that look the same.
> 4) Ancient texts, e.g. 4th-5th century, did not have accents or the elision 
> mark.
>
> In Him,
> DM
>
>> On Mar 19, 2025, at 7:16 PM, DM Smith <dmsm...@crosswire.org> wrote:
>>
>> David,
>>
>> I’ve examined the use of Lucene in SWORD and it treats ’ U+2019 as a 
>> non-token character. SWORD uses the standard analyzer which is predicated on 
>> modern European languages with a nod to CJK. The ASCII apostrophe is looked 
>> at and in some circumstances it is a token character, but most often is 
>> ignored.
>>
>> So from a Lucene perspective, the change does not affect it and would be 
>> safe.
>>
>> While in the code, I found that Xapian is the preferred external search 
>> mechanism for indexed searches. From a quick look at Xapian and SWORD’s use 
>> of it, it is a far better search engine and has language awareness. I 
>> haven’t checked to see how it handles this elision in its tokenizer.
>>
>> There are several other search mechanisms in SWORD that I’m checking out.
>>
>> Aside, on my Mac, it is easy to input U+2019 from a USA keyboard input, but 
>> requires knowing how to do it and what it is. When I turned on Polytonic 
>> Greek as my keyboard input, it wasn’t possible. Apparently elision is rarely 
>> used in modern Greek. I had thought if anyone knew Koine (ancient) Greek and 
>> could readily type it, that they’d know how to input it. I’m not so sure.
>>
>> Since you like your chat AI agents, ask it/them what the first and second 
>> level quote characters are for Greek. It surprised me. It’s not U+2019 or 
>> U+0027 for either.
>>
>> DM
>>
>>> On Mar 18, 2025, at 12:55 PM, DM Smith <dmsm...@crosswire.org> wrote:
>>>
>>> Apparently we are talking past each other.
>>>
>>> I understand the argument. The elision mark is not an accent. It is a 
>>> letter character when used between letters but a punctuation character (so 
>>> a word break character) when at the beginning or end of words. It has been 
>>> requested of the Unicode consortium to make it a letter character when it 
>>> follows a Greek letter, but this has not been done. The goal of the request 
>>> is that double clicking on the Greek word ending with U+2019 would also 
>>> select the apostrophe in the same way that it does when it is in the middle 
>>> of a character sequence.
>>>
>>> I’m saying if the filter does double duty for both presentation and 
>>> normalization of search requests, then both have to work well.
>>>
>>> That U+2019 is visually similar to U+0027 and an end user will use the 
>>> keyboard to type U+0027 when U+2019 is required. This will not work as the 
>>> exact code point has to match.
>>>
>>> Over the years, I’ve found that simple code changes often break other parts 
>>> of the code.
>>>
>>> I’ve requested to examine the search code in Xiphos to see if another 
>>> filter is applied that strips U+2019. If so, then your request probably 
>>> will work. However, if Xiphos needs to be changed in addition to SWORD lib 
>>> to accommodate the change, then probably every frontend would need to be 
>>> changed.
>>>
>>> Also, I need to dig into the Lucene index creation and Lucene search to 
>>> make sure it doesn’t require indexes to be rebuilt.
>>>
>>> DM
>>>
>>>> On Mar 18, 2025, at 3:21 AM, David Haslam <dfh...@protonmail.com> wrote:
>>>>
>>>> We don't hide U+2019 in any other context.
>>>> It's use in the KJV for possessives does not hinder search!
>>>>
>>>> Why the obsession with search, when it has no bearing on my semantic 
>>>> argument?
>>>> An elision mark is simply not an accent!!! Why is this so hard to 
>>>> understand?
>>>>
>>>> We wouldn't hide U+2019 in French if we saw it used frequently as an 
>>>> elision mark!!!
>>>> Albeit most French modules to date simply use U+0027 (unlike our KJV).
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>>
>>>> On Tuesday, March 18th, 2025 at 1:40 AM, DM Smith <dmsm...@crosswire.org> 
>>>> wrote:
>>>>
>>>>>> On Mar 17, 2025, at 5:24 PM, David Haslam <dfh...@protonmail.com> wrote:
>>>>>>
>>>>>> My argument is simple & straightforward.
>>>>>
>>>>> Your argument is that display and search should have nothing to do with 
>>>>> each other.
>>>>>
>>>>>> When you hide diacritics, you ought not to be hiding punctuation marks.
>>>>>>
>>>>>> Why is this so contentious?
>>>>>
>>>>> Software changes can have unintended consequences. It needs to be 
>>>>> carefully considered.
>>>>>
>>>>>> In what world does how a module displays require that punctuation be 
>>>>>> hidden?
>>>>>
>>>>> In a world where the filter is also used for search.
>>>>>
>>>>>> A quotation mark is not an accent!
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Mon, Mar 17, 2025 at 21:06, DM Smith 
>>>>>> <[dmsm...@crosswire.org](mailto:On Mon, Mar 17, 2025 at 21:06, DM Smith 
>>>>>> <<a href=)> wrote:
>>>>>>
>>>>>>>> On Mar 17, 2025, at 4:01 PM, Karl Kleinpaste <k...@kleinpaste.org> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> That is, if search success depends on the specificity of whether 
>>>>>>>> accents or points are enabled, you're probably doing something wrong.
>>>>>>>
>>>>>>> Absolutely.
>>>>>>>
>>>>>>> What we are discussing is whether an apostrophe should be stripped out 
>>>>>>> or not. The apostrophe on the keyboard is U+0027. The apostrophe in the 
>>>>>>> Greek is recommended to be U+2019. They look nearly the same. There are 
>>>>>>> a few other Unicode apostrophes that have the same appearance.
>>>>>>>
>>>>>>> If the apostrophe isn’t stripped out, then the user input and the text 
>>>>>>> have to agree on which of the several it is. There is no way for the 
>>>>>>> end user to know which. I end up copying from the text to make sure I 
>>>>>>> have the right one.
>>>>>>>
>>>>>>> I believe the same filter that turns on and off the display of accents 
>>>>>>> is used for searching. David is suggesting that the filter is taking 
>>>>>>> out legitimate level 2 quotation marks from the display when it 
>>>>>>> shouldn’t. I’m suggesting that it needs to remove them for the sake of 
>>>>>>> the search.
>>>>>>>
>>>>>>> DM
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel@crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to