On Tue Dec 19, 2023 at 2:17 AM CET, Timothy Allen wrote: > I tried running it over my BSB module, and I hit problems fairly > quickly, some of which are more easily solved than others. > > 1. No support for language “en” > > This was easy enough to handle, there's a configuration variable near > the top of the file that lets you configure which quotes are used for > which languages.
Patch sent to my email would be welcome. > 2. Apostrophes > > In English, the apostrophe used for possession (“the boy’s train”) and > omission (“don’t let’s start") is traditionally set with the same > character used as the closing single quote, so in any non-trivial > document there will almost certainly be more "closing single quotes" > than opening single quotes, it's not worth reporting on. Yes, I aware of it, and I feel very blessed that I don’t have this problem in Czech. I have no idea what to do with this without proper syntactic analysis, which is out of the question. Perhaps, running `re.sub(r'’s\b', '@#s', whole_text)` and then back, but it seems like a receipe for disaster. > 3. Nested quotations > > In Genesis 20:11-13, Abraham tells Abimelech that he told Sarah to tell > other people that she was Abraham’s brother. In the BSB (and NIV, and > ESV, and NASB) this results in a triple-nested quotation. In English > typesetting conventions the outermost quotation gets double-quotes, the > second level gets single-quotes, and the third level gets double quotes > again. This causes the script to report an error: > > I couldn't immediately think of a way to get around this. Me neither. We should probably make effort for error recovery, so that the script would continue even after reporting a problem, but I am not sure how to do that either. > Another quirk that occurs to me is that in English typesetting, if one > person speaks multiple paragraphs (for example, the Sermon on the Mount) > then each paragraph gets an opening double-quote, but no closing > double-quote. That's going to play havoc with this kind of > quote-checking tool, too. Yes, we don’t do this in Czech, but it is typographically possible to just use paragraph indentation instead of quoting and of course we don’t have anything like indentation in the pure XML. I have just added quotes in the appropriate places and plan sending the patch to the Czech Biblical Society (after David reviews my fixes in https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2) with some other clear bugs I have found. > Perhaps this kind of tool just isn't suited to checking English text... > but I'm sure there's other languages with more sensible conventions that > it could help with. Good luck with it! With https://gitlab.com/crosswire-bible-society/CzeCEP/-/merge_requests/4/diffs I have managed to make CzeCEP behave. Now I will try other Czech modules. Blessings, Matěj -- http://matej.ceplovi.cz/blog/, @mcepl@floss.social GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 Power tends to corrupt and absolute power corrupts absolutely. Great men are almost always bad men, […] -- Lord Acton (including the more important part of the often misquoted statement)
signature.asc
Description: PGP signature
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page