John McCabe-Dansted wrote:
> In English anything separated by a space, punctuation etc.  Finding
> word breaks in Japanese is a bit harder.

This is not even true for English. Cf. "ice cream", for instance. In German, 
you have, amongst others, splitted verbs such as "Ich kaufe ein", where "kaufe 
ein" is one word, namely the verb.

[...]
> > Usually, Acronyms are language-specific as well (cf. IPA [International
> > Phonetic Association] vs. API [Association Phonétique Internationale]).
>
> This still wouldn't help me, because I don't have any French language
> documents. I'd be pasting from a French language website, if I was
> very lucky I might be pasting from an English language document where
> I had explicitly set that text to French; in which case copying over
> the tags might make sense.

But is difficult, since the web page does not supply any language data.

> Point being that for a substantial class of users, the LaTeX model
> where copying text only copies tags in that text is more useful for
> language except in a few very rare cases. As argued elsewhere it may
> also be useful for copying e.g. a word from a section title as well.
>
> >> 2) Copy text from English US to English UK -- IMHO An English UK word
> >> in an English US document is actually just a poorly spelt US English
> >> word. Actually it is likely to even be a correctly spelt English US
> >> word.
> >
> > Why? IMHO it is correct to keep it marked es English (UK), The author
> > hgas to decide if he leaves it in UK spelling or if he adapts the
> > spelling to US convention (and reset the language).
>
> The spellchecker will pickup the misspellings, unless LyX keeps it
> marked as English (UK).

Sure. But unless it is corrected, it is UK spelling (and thus UK English).

> Admittedly, Leaving it as UK might be useful if it was a direct quote,
> and had proper quotation marks around it. (And yes I realize that
> detecting quotations in every possible language could be difficult,
> even for languages that actually have quotation marks. However I use
> citations rather than quotes, so again this isn't something that makes
> a difference)

I don't see why quotes make a difference, in general (although I see people 
usually mark quotes more often). In any case, trying to fiddle with this is 
bound to fail.

> >> In principle I may be submitting a document to an organization that
> >> requires that all text be in Language X (and only language X), in
> >> which case any LyX document I submit that contains language markup is
> >> wrong, just as if I had included a Chapter in an article.
> >
> > But not if you use different languages (if this organization is not
> > completely crazy, that is).
>
> If I use a Chapter is a journal article, then it clearly should be
> formatted as a Chapter, rather than e.g. standard?

I'm talking about language markup, not chapter markup.

> If I did accidentally insert a Chapter into a article, it would at
> least be a lot easier to find in proof-reading than multiple
> "languages"
> in a document that is actually monolingual. (Being able to add
> Chapters into e.g. amsart it could actually be useful, often I've
> wanted to do "something like style X, but with a feature from Y")
>
> Even when using a foreign word it is common to convert it to a local
> word. Japanese would commonly write English words in Katakana, and I
> don't use Mitsubishi as a Japanese word, and I certainly don't write
> it as 三菱 in an English document.

I'm not talking about loans. I'm talking about foreign words. Removing 
language markup does not make a foreign word a loan, usually. The assimilation 
process of loans is somewhat more complex than that. In the case of English 
loans in German, for instance, at least the spelling (Capitalization) changes, 
more often the morphology.

> >> More likely, the receiving institution really wouldn't care whether I
> >> use British or American English, so long as I am consistent. So in
> >> this case hard-coding either British or American English would be
> >> fine, but allowing both is again in some sense incorrect.
> >
> > But then, again, hyphenation might be just wrong.
>
> Well I am not advocating hardcoding, never-the-less it would
> presumably correct according to one of the two acceptable definitions
> of correct. (at least if you spell words according to the hard-coded
> language).

As I have argued, I still think it would _never_ be correct, except for cases 
where a word has the same word form in two languages, or (more often), two 
varieties of a language, which both is not predictable (and thus must be 
decided by explicit user action).

[...]
> > excuse me, it strikes me rather monolingual-English centric, while LyX
> > aims at being truly multilingual.
>
> Presumably, not all monolingual users speak English; I would imagine
> that French users may have similar issues with regard to French/French
> Canadian.

Dunno about French. However, when writing German, I welcome very much that the 
information about texts being old German spelling, new German spelling or 
Swiss German is not lost during copying. 

> In any case I don't see why this is an objection to this feature (I
> can certainly see why you wouldn't want to work on it yourself).
> I don't see any reason for LyX to not have features that only help
> monolingual users, or rather users that only prepare monolingual
> documents in LyX. I've studied other languages, and I could write a a

Monolingual users do not need such a feature. If they are hit by language 
markup, they are in fact not monolingual users anymore (in a broad sense of 
multilingualism, including 'inner multilingualism', i.e. language varieties).

[...]
> The main advantage that the lack this feature has for such users is
> that the aggravation it causes them, might induce them to better
> understand the difference between UK and US English. However if you
> fix all the bugs wrt. ERT etc., and  then most of them will probably
> think the blue overline is just form of screen corruption (it doesn't
> appear in the PDF after all). If nothing else, the
> monomodal/mulitiligual dialog might clue them into what the blue line
> actually means.

If they ignore the blue underlining, at least the output will be correct. If 
we remove the markup and people ignore that, the output will in many cases not 
be correct.

> Also if this dialog popups up immediately when you attempt to paste
> the text in, it would be obvious to you which source document had the
> wrong (sub) language.

Such a dialog would be much more annoying than the language markup.

Jürgen

Reply via email to