OK, I can't blame Peter anymore since Greg also found the '2's. :)
So, I hunted this one down. Odd problem...
We 'fixed' a bug which was incorrectly checking for file open errors.
We were already sending the error code to logError when there was an error.
Well, we now detect file open errors
If you have a list of words with valid hyphenation points, it is very
valuable to someone someday that list is documented as a spelling
dictionary, even if it is incomplete and known to be. Finding valid
hyphenation points is the biggest chunk of time in preparing for
publication. and in many Afric
Since my 9:26pm reply, I've been a busy bee, and generated a counted list of
the Lingala words that contain a soft hyphen.
i.e. After I removed the multiple and "useless" occurrences.
There are 4584 such words, though one escapee has just "ambushed" me.
001 Israel
This one begins with a so
I didn't ignore it, but you may have missed my reply when you started to
compose yours.
The ZWNJ is indeed the proper character to use.
This is a semantic matter, nothing to do with hyphenated word-wrap at line
end, which is solely presentational.
David
--
Sent from: http://sword-dev.350566.n
Hi David, I think Michael has made a point which you ignored in your response - Indic and other scripts. The correct character in most of these places though is likely a zero width non joiner space character, at least it would be in Arabic derived scripts. I think the correct solution is that if we
What would be of interest as a practical benefit for future typesetters is to
prepare a comprehensive replace list for all the longer words in the LinVB
source text.
The search column would contain the word without a soft hyphen.
The replace column would contain the same word with a soft hyphen at
Regexp `([ [:punct:]]\xAD|\xAD[ [:punct:]])` is a reasonable definition for a
"useless soft hyphen",
unless in the language there is a punctuation mark that is used as part of a
word.
The inventors of some alphabets chose more wisely than others by allocating
for the glottal stop the character cal
I see your point. For them to be useful, every word should have a soft hyphens
between syllables (or intra-word semantic breaks). Not just some. It is just as
likely in a dynamic word wrap of a browser (or other etext viewer) whose width
can change that any word but the first few on a line will
They should use the ZWNJ rather than the soft hyphen.
ZWNJ = Zero Width Non Joiner U+200C.
The caution should not have been necessary.
David
--
Sent from: http://sword-dev.350566.n4.nabble.com/
___
sword-devel mailing list: sword-devel@crosswire.or
Having soft hyphens to improve readability on hand held small devices is fine
in theory, but it's not in practice.
The more I've thought about soft hyphens, the more I've understood that
their use was a kludge for a particular typesetting task at one time for
publishing a printed Bible from Quark
Le 02/11/2017 à 15:28, DM Smith a écrit :
> I don’t think they should be removed upstream except to fix errors. David
> classified these as multiple and useless. Regarding useless, I’m not sure
> that “punctuation” is such a universal language construct that it can be
> included in such a dete
I don’t think they should be removed upstream except to fix errors. David
classified these as multiple and useless. Regarding useless, I’m not sure that
“punctuation” is such a universal language construct that it can be included in
such a determination. E.g. An apostrophe is often used as a glo
Correction, it's the "installmgr -ri CrossWire KJV" command that generates
a wall of "2" output.
--Greg
On Thu, Nov 2, 2017 at 9:19 AM, Greg Hellings
wrote:
> I should not that this is in SVN HEAD in my case, not the RC.
>
> --Greg
>
> On Thu, Nov 2, 2017 at 9:19 AM, Greg Hellings
> wrote:
>
>
I should not that this is in SVN HEAD in my case, not the RC.
--Greg
On Thu, Nov 2, 2017 at 9:19 AM, Greg Hellings
wrote:
>
>
> On Thu, Nov 2, 2017 at 7:15 AM, Peter Von Kaehne wrote:
>
>> I noticed this first on svn head in my normal source directory where I
>> also work - so I suspected that
On Thu, Nov 2, 2017 at 7:15 AM, Peter Von Kaehne wrote:
> I noticed this first on svn head in my normal source directory where I
> also work - so I suspected that it was my fault from something I did not
> remember I had done at some point somewhere. So I downloaded your RC tar
> ball and compile
The nonjoiner (U200c) is probably the best candidate for a proper
replacement, but doing something like that really needs native eyes to
confirm it still renders the right way.
And the nonjoiner character is likely going to have all the same search
functionality that the soft hyphen will. Only whe
CAUTION:
The soft hyphen is sometimes used in Indian and East Asian language scripts
to prevent two adjacent characters from becoming a combined ligature. This
is more common in minor languages. It is commonly used when the font in use
while being typed is designed for another language using the s
I just read your proposal on gitlab that joined mine. So we agree for a
job on the osis. Add the option to the conversion script would be great.
Le 02/11/2017 à 13:35, Cyrille a écrit :
>
> Le 02/11/2017 à 13:25, David Haslam a écrit :
>> It is a much simpler task to remove ALL soft hyphens rather
Le 02/11/2017 à 13:25, David Haslam a écrit :
> It is a much simpler task to remove ALL soft hyphens rather than removing
> only the delinquent ones!
My proposition is to remove it in the osis file maybe during the
conversion from usfm to osis, with o2u.py. Maybe Ryan would accept to
add this in
Le 02/11/2017 à 10:36, ref...@gmx.net a écrit :
> Leaving aside the module you are working on, how many other modules
> have the same problem?
konnym is affected by this problem.
> If it is a few only, we might as well reissue them and worry about
> engine enhancement later.
>
> Peter
>
> Peter
It is a much simpler task to remove ALL soft hyphens rather than removing
only the delinquent ones!
- multiple soft hyphens at the same position in a word
- useless soft hyphens (before or after a space or punctuation mark)
Delinquent ones were quite a common occurrence in the Lingala source text
I noticed this first on svn head in my normal source directory where I also
work - so I suspected that it was my fault from something I did not remember I
had done at some point somewhere. So I downloaded your RC tar ball and compiled
that in a separate location + run it in place (see the paths
I will check hopefully later today.
Peter
> Gesendet: Donnerstag, 02. November 2017 um 12:11 Uhr
> Von: "David Haslam"
> An: sword-devel@crosswire.org
> Betreff: Re: [sword-devel] Text preparation for searching in SWORD [was: Soft
> hyphens]
>
> Thanks Troy,
>
> I probably won't have chance t
Thanks Troy,
I probably won't have chance to test - I'm a text & module developer,
not a code developer that builds SWORD from source together with a suitable
front-end.
I was already aware of the concept of strip filters, and these are to some
extent mentioned in our wiki.
https://crosswire.org
SWORD has a number of filtering stages which occur at different places
and events.
Specifically interesting for this discussion are "strip filters". These
are called immediately before searching and should be called on the
search string before passing it to search:
ListKey results = module.searc
I am recommending the complete removal of soft hyphens because their use is a
typographical kludge not semantic construction.
See https://crosswire.org/wiki/Converting_SFM_Bibles_to_OSIS#Soft_hyphens
Being a kludge, there could never be any possibility that any particular
word would always have t
Leaving aside the module you are working on, how many other modules have the same problem? If it is a few only, we might as well reissue them and worry about engine enhancement later. PeterPeterSent from my mobile. Please forgive shortness, typos and weird autocorrects. Original Message ---
Update: Research results of SWORD search for soft hyphens:
In Xiphos there is a problem with the exact search.
If the same word occurs in the text both with and without a soft hyphen,
- A search for the word with a soft hyphen will find only those instances
- A search for the word without a soft
28 matches
Mail list logo