I would like to push back on Andrew's point 3.  Oh, and thank you Andrew
for the clear analysis, which is very helpful!

The issue you raise in point 3 is only a problem if strings in different
languages in a multilingual text are concatenated without language
tagging.  If a Polish author (Polish has ś) is writing about peace in
Sanskrit (śānti) then the point 3 problem might arise.  But it doesn't
arise in the real world because of readers' implicit knowledge that these
are different languages.  When encoding, we should make the implicit
explicit and say <polish>śląc</polish> <sanskrit>śānti</sanskrit>.  We
should also do this in order to get the right hyphenation and other
language features (spell-checking, and whatnot).  The language-tagging also
does the encoding-switching: ś in Polish is one thing, ś in Sanskrit is
another. In a tool like Aksharamukha, for example, the ś in śānti may
become श but the one in śląc may not.

Best,
Dominik
_______________________________________________
INDOLOGY mailing list
[email protected]
https://list.indology.info/mailman/listinfo/indology

Reply via email to