Re: [Freedos-user] Unicode (It was 'Problem with USB keyboard in some computers')

Henrique Peron Tue, 05 Jul 2011 22:01:53 -0700

Saluton!

Em 05/07/2011 18:25, Rugxulo escreveu:
>> Before I forget, I noticed that you do use ISO codepages.
>> I'll work on distinct packs of codepages and keyboard layouts for ISO
>> 8859-1 ~ 16.
> Honestly, I very rarely use only Latin-3 (913), so please don't waste
> 500 hours on my account!   ;-)   It's very low priority. Minimum
> "good" set would be Latin 1-4 (IMHO) and perhaps Latin-15 (or whatever
> is Latin-1 with Euro, I never can remember, Latin-9 or ISO 8859-15 or
> ???).
My friend, it is always a pleasure. I do hope that end-users have as 
much fun using "my" codepages and keyboard layouts as I have while 
making the necessary researches and working on them. :)


ISO 8859: good part of the job is already done (the codepages) - for a 
long time already, by the way. All I need now is to work on distinct 
versions of all the keyboard layouts which could work with ISO 
codepages; if it takes 500 hours to get the job done, don't worry. I 
won't bill you. ;-)

Latin-1 with Euro, on ISO, is "Latin-9", a.k.a. ISO 8859-15.
>>>> While Unicode is huge, DOS keyboard layouts tend to be limited to
>>>> Latin and Cyrillic and some other symboly which is a tiny subset.
>> Nowadays, FreeDOS is able to work with the latin, cyrillic, greek,
>> armenian and georgian alphabets, the cherokee syllabary and japanese.
>
> You are a one-man marching band!! You've done such good work here for us!   
> ;-)
Thank you for your words (on the good work) but we know that it is not 
quite a "one-man marching band" - without Aitor's KEYB/KC/KLIB/DISPLAY 
and Eric's MODE, I couldn't have done anything. hehehe!! :)

Besides, there is this one case which I didn't participate in: support 
for japanese.  This oneis not "my child". It was teamwork directly 
between Aitor and a japanese end-user. Not only I don't even remotely 
have knowledge on japanese kanji (so to work on japanese codepages) but 
I also don't have the necessary hardware to test it. You can see for 
yourself: http://homepage3.nifty.com/sandy55/Video/PS55_DA.html

It turns out that, when/if there's a korean or chinese FreeDOS user, I 
won't be able to help him at all. I'm seriously curious about how 
Johnson Lam deals with that, by the way.
>>> Right-to-left might be hard to do (I guess?), but technically as long
>>> as they can see and enter what they want, I'm sure they can get used
>>> to left-to-right.
>> Excuse me? How can anyone type the arabic, syriac or hebrew abjads from
>> left to right? *That* would be really exotic, if ever possible! :-)
>
> How can anybody play guitar upside down or wrong-handed? But people do
> it!!!  ;-)
>
> kool m'i gnipyt siht sdrawkcab thgir won (ylwols)
hehehe!!! However, your example exactly matches the hebrew case - 
Letters which don't visually connect to the next one. Therefore, it's 
just a matter of reading it in a proper way. In what comes to the arabic 
abjad, the visual aspect if trying to type it left-to-right is not even 
worth to discuss. (I can't resist it: playing the guitar upside down is 
just a matter of training and "wrong-handed" is just wrong if you don't 
shift the position of the strings and, of course, training - more on 
that, please check with Paul McCartney! :-)))
> BTW, last I heard, Eli Z. was working on bidi editing in GNU Emacs.
Hmmmm... I don't know Eli Z. nor GNU Emacs. Just a moment. Let me google 
it. (Sandwatch rolling)

Oh, ok! Great! Interesting! However, I didn't find any mention to 
"BIDI", "arabic", "hebrew", "right", "left", etc. on his webpage. 
Perhaps BIDI is a work in progress, as you said.

Mined has support for "poor man's BIDI" (Thomas Wolff's, the developer, 
own words).

Arabic letters (for the arabic language) can have up to 4 different 
shapes, according to the position in a word (initial, medial, final) or 
if it is isolated (as on an acronym). On graphical environments, you 
only find the isolated shapes of the letters on the keyboard. However, 
as you type them, the operating system dynamically and continously 
replaces the shapes of the letters for the proper ones. Let me take the 
arabic word "qamar" (moon), for instance. For reasons not relevant to 
the scope of this conversation (and particularly concerning this word), 
"a" is not written, therefore we type "qmr".
a) You type "qaf" (the arabic letter equivalent to our "q"). The screen 
displays the isolated shape of it.
b) You don't press <space>; now you type "meem" (the arabic equivalent 
to our "m"). Since you hadn't pressed <space>, the operating system 
understands that "qaf" was the first letter of a word. It replaces its 
isolated shape for its initial shape. Well, there's another letter to 
come: "meem". There's already a letter in a initial position, therefore 
letter "meem" can only come on its final shape. End of word.
c) You still don't press <space>; now you type "ra". Yes, their "r". 
Since once again you hadn't press <space>, the operating system 
understands that "meem" wasn't the last letter of the word, after all. 
It trades its shape from final to medial. Then, it displays "ra" on its 
final shape. End of word. (Again.)

Well, FreeDOS doesn't provide this "Artificial Intelligence" feature, 
therefore each software must deal with it on its own. Mined's "poor 
man's BIDI" deals with it in this way: When you type arabic, the cursor 
stands still while the text keeps being pulled to the right, so that the 
following letter is positioned at the left side of the previous one. 
Besides, I also cannot count on Mined to dynamically change the shapes 
of the letters so I had to encode every shape of every letter directly 
into the keyboard.
>> UTF-8 is best suited for languages written with the latin alphabet
>
> I just don't know if such a bias really is universally accepted or
> not. As we've seen, it's not exactly "universal" which Unicode method
> is preferred. I guess it matters less these days with Java being
> ubiquitous and RAM being humongous.
All I said is that UTF-8 is best suited for languages written with the 
latin (also cyrillic, greek, georgian, armenian) alphabet; in what comes 
to storage, texts based on those alphabets, when encoded as UTF-8 use 
many less bytes than when encoded as UCS-2 which, in turn, is best 
suited for CJK text because all CJK characters, under UTF-8, take 3 
bytes each while under UCS-2 they take 2 bytes each. UCS-2 is also 
interesting in what concerns <Backspace> handling. It will always remove 
the last two bytes. Under UTF-8, since it is a variable-byte encoding, 
analysis must be done prior to deleting text. Tough job, I guess.
>>>>> 4). Arabic (easy??)
>>>> Unicode lists maybe 300 chars for that, at most.
>> If we restrict ourselves to the arabic language, I can tell you that it
>> is much less.
> We don't need to support "everything", just enough for reasonable 
> functionality.
"Reasonable functionality" is very relative. Well, much before worring 
about that, the point is that most Unicode's BMP blocks must be left 
behind on text mode anyway because of character shape's complexity. 
Well, back to what we perhaps could call reasonable functionality, I 
think on support for national official languages. Nowadays, using the 
arabic abjad, as national official languages we have arabic, persian, 
dari, urdu, pashto - and malay, which is (also) written with the arabic 
abjad in Brunei, where it is co-official script for that language. IBM 
and MS even provided a few codepages for arabic on those days, one for 
persian/dari - which are essentialy the same language - cp1098 and one 
for urdu (cp868). While I have not found any codepages for pashto and 
malay, I can easily devise one. Support for all other languages written 
with the arabic abjad could be left for a later stage.
>> My conclusion: either there was a wholly tailored MS/IBM-DOS for India
>> on those days or there were particular COM/EXE programs that would put
>> any regular DOS on graphics mode so to handle ISCII.
>
> See Hindawi@FreeDOS. (Haven't checked, but it sounds like it uses
> Allegro for gfx.)
Thank you for the info! I'll definitely check that.
>> Important to mention is that english is generally regarded as
>> "pure-ASCII" but we must consider the fair amount of foreign words (like
>> "café") and the need of accented/special chars used in middle and old
>> english, therefore the english language (as much as german, french or
>> any other latin-alphabet-based language) also falls in the same
>> situation as portuguese.
> Well, except that almost nobody puts accents on English words, even
> loan words. At least I never do. "naive" and "cafe" have to suffice
> for me.  ;-)
>
> BTW, surely I'm not telling you anything you didn't already know, but
> ... Old English is, erm, kinda like dead and old and 100%
> incomprehensible and not used and stuff. (Beowulf?)   :-))    Middle
> English is just weird spelling and archaic words (Shakespeare?
> Chaucer?), hence we're not exactly using it a lot either. ("Anon!
> Forewith she shewn the waye!")
Naturally, support for medieval and ancient languages and characters can 
be left (in my opinion) for a later stage.
>> In what comes to storage (and UTF-8), russian needs the regular latin
>> digits (1 byte each) and the cyrillic letters (2 bytes each char); if we
>> think on cyrillic needs in general, then we also have the ukrainian
>> hryvnia currency sign, a 3-byte char (again, "Currency Symbols",
>> 2000h-206Fh).
> I don't know why it isn't acceptable to just spell it out as "30
> hryvnia" instead of always having specific symbols for everything.
Acceptable it is. However, whenever appropriate or for sake of 
practicity, we type "US$ 30" instead of "30 american dollars", "£30" 
instead of "30 sterling pounds", etc. Otherwise, we would have, for 
instance, financial magazines the size of Bibles.
>>>>> own scripts are a problem, not to mention those like CJK that have
>>>>> thousands of special characters. (e.g. Vietnamese won't fit into a
>>>>> single code page, even.)
>> Actually, it does. There was a standard called VISCII on the old days.
>> It has been available for FreeDOS for a while already. The catch is: due
>> to the hugh amount of necessary precomposed chars (134), there are no
>> linedraw, shade, block or any other strictly non-vietnamese precomposed
>> char on the upper half of VISCII and 6 less-used control chars on the
>> lower half had their glyphs traded for the remaining 6 precomposed
>> vietnamese accented latin letters.
> I looked it up on Wikipedia a while back, and it had like three
> different workarounds, all different but all logical enough (to me, at
> the time). So maybe I'm worried over nothing. Maybe they can get along
> fine without explicit support. Maybe we should let them come to us and
> tell us how the hell to "fix" it!   :-))
Too late. I prepared the vietnamese VISCII and keyboard layout for 
FreeDOS a long time ago, as a matter of fact. :)
>>>> When you have Unicode, you do not need codepages.
>>> Right. And when you have a 286 or 386, you don't need to limit to 1 MB
>>> of RAM.   ;-))
>> Furthermore, due to the number of glyphs (and the shape complexity of
>> many of them), I can only imagine Unicode working on graphics mode and
>> that will certainly complicates matters for very old computers...
> For good or bad, it's long been assumed by most developers that
> everybody has VGA or SVGA or newer. (With "modern" OSes, it's worse:
> gfx acceleration, OpenGL, DirectX 9, etc.)
Well then, let's go graphics! :)
>> Unless it be considered some sort of "sub-Unicode" support for them, focusing
>> only on latin, cyrillic, greek, armenian and georgian alphabets because
>> their letters can easily fit on regular codepages and they cover the
>> needs of the majority of world's languages. That could be the best
>> possible workaround.
> I'm not sure 8086 is really a feasible target anymore (though I'm not
> actively suggesting dropping it). But do such retrocomputing people
> even want Unicode support? I doubt it. Like you said, they're probably
> happy enough (or even English only!).
It seems you're thinking on a scenario where all retrocomputing people 
live in the USA (or UK, or Australia). Let me give you an example: many 
video BIOSes in computers here in Brazil, back on those days, used two 
distinct (both proprietary) national standards (Abicomp and BRASCII) in 
order to provide support for the portuguese language even on regular CGA 
or Hercules displays. Therefore, it is not a matter of wanting but a 
matter of needing Unicode support. While BRASCII was almost identical to 
ISO 8859-1, Abicomp was an encoding completely different from anything 
else. It gets worse: many cheque printers (if not all) here in Brazil 
use either one or the other of those two encodings to this day. They 
even come with DOS drivers!

Naturally, when I talk about "need for Unicode" even for a simple 
language like portuguese (from the script's point-of-view), I'm 
considering an all-or-nothing scenario where there would be no regular 
codepages but only Unicode.

Henrique


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Freedos-user mailing list
Freedos-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-user

Re: [Freedos-user] Unicode (It was 'Problem with USB keyboard in some computers')

Reply via email to