RE: Purpose of plain text (WAS: Re: combining: half, double, triple et cetera ad infinitum)

CE Whitehead Tue, 15 Nov 2011 12:05:26 -0800








Hi, once more.
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]; [email protected]
> Subject: RE: Purpose of plain text (WAS: Re: combining: half, double, triple 
> et cetera ad infinitum)
> Date: Mon, 14 Nov 2011 15:30:00 -0700
> 
> Naena Guru <naenaguru at gmail dot com> wrote:
> 
> > If it came out as Unicode has its only goal as money making, that is not 
> > what I meant to say. Nothing can be such. You sell something for the 
> > buyer's benefit, right?
> 
> Unicode doesn't sell anything, except (I suppose) printed copies of the
> standard and admission to conferences.
> 
> > I apologize if you feel hurt over it.
> 
> I don't feel hurt.  I do feel annoyed about the continued
> misinformation.
> 
> > However, it is probably the main objective. Who works for nothing except 
> > odd crazies like me?
> 
> You'd be surprised how many people have volunteered their time and
> expertise to help improve Unicode.
> 
> > When years back I asked why ligatures formed inside Notepad and not inside 
> > Word, I had the clear reply that it is owing to a business decision.
> 
> That doesn't mean Unicode is broken.  It means that some applications
> have support for certain text processes that other applications don't
> have.  Have you ever seen two graphics editors, one of which has more
> capabilities than the other?  Does that mean the underlying graphics
> format is broken?
> 
> > Let me try to clearly say what I want to say:
> > 1. Unicode came up with the idea of one codepoint for one letter of any 
> > language.
> 
> Sort of.
> 
> > 2. The justification was that on one text stream you could show all the 
> > different languages. At least that is what I understood.
> 
> Not just "show."  You can "perform text operations on" all the different
> characters.  Not every Unicode-aware application is required to have
> fonts and rendering technology for every character or script.  Otherwise
> nobody would have adopted it.
> 
> > 3. The above 2 is not practical and does not work even now after so many 
> > years
> 
> There was never a requirement that all applications can display all
> scripts perfectly.  There has been continuous improvement over the past
> 20 years toward making this happen.  It does not all happen at once.
> 
> > 4. Why Indic code pages do not work so well for text processing is not the 
> > fault of Unicode but that of the user groups
> 
> I assume you mean 8-bit "code pages."  Unicode doesn't have "code
> pages."
> 
> > 5. However, technology arrived at those countries too late to for actual 
> > users, not bureaucrats, to understand the mistakes
> 
> Can you explain what you feel is wrong with Unicode handling of Indic
> text, WITHOUT repeating that not all applications can display everything
> perfectly?
> 
> > 6. Therefore, I say that there was an undue push by Unicode to complete the 
> > standard, by issuing ultimatums for registering ligatures etc.
> 
> This is a misrepresentation, and makes no sense.
> 
> > Having said all that, all is not so bad. I say transliterate to Latin and 
> > make smartfonts. It is a proven success.
> 
> How can I search a group of documents, one written in Devanagari and
> another in Sinhala and another in Tamil and another in Oriya, for a
> given string if they all use the same encoding, and the only way to tell
> which is which is to see them rendered in a particular font?  That has
> been tried before.  It is a proven failure.
> Agreed, here.  Also some uses input text strings in multiple languages and 
> scripts??
>> I do not understand what you meant by "jury-rigged to accommodate visual 
>> display order". Did you mean using unexpected shapes for>> Latin codes? If 
>> you meant that, how do you justify earlier versions of Unicode standard 
>> giving specific explanation about codepoints that >> they do not represent 
>> shapes and Fraktur and Gaelic could very well use Latin as their underlying 
>> codes?
> 
> Latin (Antiqua) and Fraktur and Gaelic letters are, intrinsically, the
> same letter.  That is not true for Devanagari and Sinhala and Tamil and
> Oriya letters.
> 
>> I think the ability to use text in the computer in the way you expect text 
>> to behave in it is very important. For instance, if you have shape 
>> representations mapped to code clusters, scanned text could be more 
>> accurately digitized.
> 
> Go ahead and design your own encoding, then.  It may be of use for niche
> applications that care only about display and nothing else.A personal view:  
> First, I think it's worthwhile to work within Unicode, which is where the 
> mainstream of work is being done, even for the disatisfied.  As for smart 
> fonts, I'm not sure I understand these correctly.  I personally think it 
> would be interesting to see language-sensitive fonts that treate Arabic and 
> Persian numbers as different shapes for the same numbers; this could be 
> important in preventing security breaches.  But this is not how it's done.  
> That said, I don't think everything can be handled by smart fonts.And in 
> fact, had we relied on smart fonts to display one single set of numbers as 
> either Persian or Arabic, we would have had to wait for the apps/smart fonts 
> to come along. (Perhaps then someone like yourself would haved created all 
> the needed fonts.)Then there are proposals for new unicode characters, for 
> example  the Urdu jazm (syllable coda, termination of a syllable); to display 
> these currently correctly relies on language-specific styling and some fonts 
> can do it and some cannot; perhaps I should have favored the encoding of a 
> new character here but for security reasons I decided the current characters 
> we had should be sufficient. 
As for transliterating, then making a smart font, again I am unsure I 
understand smart fonts here, can all languages can be transliterated 
character-for-character?  Arabic, for example, has two aliphs, dagger aliph and 
standard aliph.  The phonetic transliteration into Latin, for both, is 
identical.
Nevertheless, transliterations, where the user selects from several characters 
to get the right character, can be very helpful and save tons of downloading of 
whole character sets in the case of languages such as Chinese that have a huge 
inventory (the code charts even do not download onto my mini at all).  
Character pickers can work this way, letting the user input a Romanized 
character (which is just fine for  users who know the Latin alphabet and many 
do; I am, of course, unfamiliar enough with smart fonts as to be unsure as to 
whether these would handle the dagger and standard aliph properly.  Would they 
handle this properly?).
(Sorry that I always use Arabic as my example; I know Arabic somewhat; and a 
few words/phrases of Persian; I have zero familiarity with most other Asian 
languages.  And P.S. I don't make money from participating in the list though 
it may help me at some point to "pad my resume"  -- that is to add a comment 
that I participate in lists; in any case, I do think it is great IMO that the 
web has supporters among/technical input from both individuals and commercial 
users as well as places like the Library of Congress.)
Best,
--C. E. [email protected]
> 
> --
> Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
> www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell 
>
RE: Purpose of plain text (WAS: Re: combining: half, double, triple et cetera ad infinitum)

Reply via email to