Further clarification and observations about the SWORD filter for
UTF8GreekAccents...
My reply of 7th June was sent before I was informed about the source code for
UTF8GreekAccents.
In fact, this does make use of the mapping table that I provided in March 2017.
Thanks, Troy!
You can visit the latest version in SVN trunk here.
https://crosswire.org/svn/sword/trunk/src/modules/filters/utf8greekaccents.cpp
Please note that it was patched during the weekend to add the lines to process
GREEK KORONIS & COMBINING GREEK KORONIS.
as well as to remove a residual (unused) declaration leftover from the original
version. Thanks, Troy.
We may have been wondering why the filter still includes a line to remove the
RIGHT SINGLE QUOTATION
converters[0x2019] = ""; // RIGHT SINGLE QUOTATION MARK
This is because the source text in some older accented Greek modules used this
Unicode character.
These are usually found at End of Word locations, with typically 1218
occurrences.
More recent editions of the Greek NT use the GREEK KORONIS 0x1FBD in all these
same locations.
Modules with 0x2019 include MorphGNT, TischMorph and 2TGreek.
Modules with 0x1FBD include SBLG_THE.
FIO. The only Greek letters ever followed by the character are typified by the
following analysis (extracted from MorphGNT).
Count Pattern
0034 δ’
0107 θ’
0233 τ’
0292 π’
0213 λ’
0132 φ’
0061 ρ’
0149 ι’
The counts vary slightly for different modules.
We should consider the conjecture that the first ever digitisation of (e.g.)
the Tischendorf NT was simply transcribed incorrectly.
i.e. 0x2019 was keyed everywhere one would nowadays expect to use a GREEK
KORONIS.
Maybe the task was performed between Unicode 1.0 (October 1991) and Unicode 1.1
(June 1993) ?
Aside: It's very likely that digitisation took place before Unicode even
existed, and that the text was subsequently converted to Unicode.
Some of you may remember Claremont-Michigan encodings for Hebrew, Aramaic and
Greek.
So, rather than being a bug in SWORD, in retrospect it looks more like an
accommodation to a systematic transcription error in some NT Greek text sources.
What we should do about it remains an open question.
One new question arises from the changes to the SWORD filter (2017 & 2018).
Has anything similar been done for the equivalent JSword filter?
Best regards.
David
Sent with [ProtonMail](https://protonmail.com) Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On 7 June 2018 8:06 PM, David Haslam <dfh...@protonmail.com> wrote:
> This ongoing problem affects far too many module releases.
> The immediate cause is a wrong assumption implemented in the confmaker script.
>
> The UTF8GreekAccents filter does not restrict its filtering to accents joined
> or adjacent to letters in the Greek alphabet.
> And by "accents" please remember that some of these are actually Unicode
> punctuation marks.
> It applies the filter "willy-nilly" no matter what the context in terms of
> language, script or alphabet.
> It's a one-way valve that should never be used "backwards" to determine
> whether or not it should be present in the .conf file.
>
> Aside: The other UTF8 filters are not like this, so it's OK for confmaker to
> use them for testing to see if they are required.
>
> The set of Unicode characters filtered by UTF8GreekAccents are not unique to
> the Koine Greek language.
> Some of them are found in many other languages.
>
> It's theoretically feasible to redesign the filter such that it applies only
> in the context of Greek letters.
> So yes, this is a matter for SWORD developers to consider too.
> I documented a suitable mapping table in my GitHub repo in March 2017. See
> https://github.com/DavidHaslam/UTF8-Greek-Accents
>
> It was discussed in this mailing list at the time.
> Troy was unwilling to replace the existing filter on the grounds that it does
> what it was designed for on accented Greek modules.
> The point is this. It was never designed to be used in general to test
> whether it is needed by a module.
> When used for this unintended "backwards" purpose, it generally gives the
> wrong answer.
>
> This concept is not difficult to understand.
>
> Unless and until the filter itself is redesigned, we need a compromise
> workaround for the confmaker script.
> My suggestion is to restrict applying this "backwards" test to only the
> modules in which this line is present.
>
> Lang=grc
>
> This would largely prevent the ongoing spurious addition of this filter due
> to the automation of module publishing.
> One can imagine there may be corner cases, such as where (e.g.) a French
> Bible module had study notes which included some accented Greek words.
> But the impact would be minimal by not having the filter in the conf file in
> such rare cases.
>
> Best regards,
>
> David
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On 7 June 2018 7:25 PM, DM Smith <dmsm...@crosswire.org> wrote:
>
>> I think it is a bug in the SWORD engine if single right quotation mark is
>> seen as a Greek diacritic.
>>
>> Will look later to verify.
>>
>> If it is then the module should not have the option.
>>
>> — DM Smith
>>
>> On Jun 7, 2018, at 8:54 AM, "ref...@gmx.net" <ref...@gmx.net> wrote:
>>
>>> If a Greek accent is in use, the filter will be there. If this is a bug,
>>> I.e. there should not be a Greek accent, please highlight this at source. I
>>> guess this is the right approach here too. Then the next iteration will not
>>> have a spurious filter
>>>
>>> Sent from my mobile. Please forgive shortness, typos and weird autocorrects.
>>>
>>> -------- Original Message --------
>>> Subject: Re: [sword-devel] Module upload: FreLXX
>>> From: David Haslam
>>> To: SWORD Developers' Collaboration Forum
>>> CC:
>>>
>>>> This line in frelxx.conf is superfluous:
>>>>
>>>> GlobalOptionFilter=UTF8GreekAccents
>>>>
>>>> I think it's triggered in confmaker script by the presence of these
>>>> characters.
>>>> U+2019 ’ 656 RIGHT SINGLE QUOTATION MARK
>>>>
>>>> NB. The source text is inconsistent in which character is used for the
>>>> typographical apostrophe. cf.
>>>> U+0027 ' 39,200 APOSTROPHE
>>>>
>>>> Example:
>>>> Exodus 3:13 contains "les fils d'Israël" (character U+0027 used)
>>>> Exodus 3:15 contains "aux fils d’Israël" (character U+2019 used)
>>>>
>>>> When the Greek Accents filter is disabled (in Xiphos) the latter becomes
>>>> "aux fils dIsraël" (without the apostrophe).
>>>>
>>>> There are no Greek letters in the module, so the GreekAccents filter
>>>> should not be included.
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>> Sent with ProtonMail Secure Email.
>>>>
>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>
>>>> On 4 June 2018 7:38 AM, wrote:
>>>>
>>>>> Dear All,
>>>>>
>>>>> This is to announce that we have just now uploaded FreLXX.
>>>>>
>>>>> This is is an updated version of FreLXX.
>>>>>
>>>>> Many thanks to update for the hard work.
>>>>>
>>>>> yours
>>>>>
>>>>> The Module Team
>>>>>
>>>>> P.S.: This email is sent automatically on upload of a new/updated module
>>>>>
>>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>>>
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel@crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page