I think the idea of tagging complete strings with "language" is not terribly useful. If it's to be of much use at all, then it should be generalized to a metaproperty system for applying any property to any range of characters within a string, such that the properties float along with the characters they modify. The whole point of doing such properties is to be able to ignore them most of the time, and then later, after you've constructed your entire XML document, you can say, "Oh, by the way, does this character have the "toetsch" property?" There's no point in tagging text with language if 99% of it gets turned into "Dunno", or "English, but not really."
I tend to agree, and BTW that's exactly what an NSAttributedString does on Mac OS X. To quote the docs:
An attributed string identifies attributes by name, storing a value under
the name in an NSDictionary. You can assign any attribute name/value pair
you wish to a range of characters, in addition to the standard attributes
described in the "Constants" section....
See: <http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ ObjC_classic/Classes/NSAttributedString.html>
(Of course, and NSDictionary is the Cocoa version of a hash.)
This is the basis of styled text handling on Mac OS X, but you can toetsch-ify XML documents as well.
JEff