Saluton! Em 05/07/2011 18:25, Rugxulo escreveu: >> Before I forget, I noticed that you do use ISO codepages. >> I'll work on distinct packs of codepages and keyboard layouts for ISO >> 8859-1 ~ 16. > Honestly, I very rarely use only Latin-3 (913), so please don't waste > 500 hours on my account! ;-) It's very low priority. Minimum > "good" set would be Latin 1-4 (IMHO) and perhaps Latin-15 (or whatever > is Latin-1 with Euro, I never can remember, Latin-9 or ISO 8859-15 or > ???). My friend, it is always a pleasure. I do hope that end-users have as much fun using "my" codepages and keyboard layouts as I have while making the necessary researches and working on them. :)
ISO 8859: good part of the job is already done (the codepages) - for a long time already, by the way. All I need now is to work on distinct versions of all the keyboard layouts which could work with ISO codepages; if it takes 500 hours to get the job done, don't worry. I won't bill you. ;-) Latin-1 with Euro, on ISO, is "Latin-9", a.k.a. ISO 8859-15. >>>> While Unicode is huge, DOS keyboard layouts tend to be limited to >>>> Latin and Cyrillic and some other symboly which is a tiny subset. >> Nowadays, FreeDOS is able to work with the latin, cyrillic, greek, >> armenian and georgian alphabets, the cherokee syllabary and japanese. > > You are a one-man marching band!! You've done such good work here for us! > ;-) Thank you for your words (on the good work) but we know that it is not quite a "one-man marching band" - without Aitor's KEYB/KC/KLIB/DISPLAY and Eric's MODE, I couldn't have done anything. hehehe!! :) Besides, there is this one case which I didn't participate in: support for japanese. This oneis not "my child". It was teamwork directly between Aitor and a japanese end-user. Not only I don't even remotely have knowledge on japanese kanji (so to work on japanese codepages) but I also don't have the necessary hardware to test it. You can see for yourself: http://homepage3.nifty.com/sandy55/Video/PS55_DA.html It turns out that, when/if there's a korean or chinese FreeDOS user, I won't be able to help him at all. I'm seriously curious about how Johnson Lam deals with that, by the way. >>> Right-to-left might be hard to do (I guess?), but technically as long >>> as they can see and enter what they want, I'm sure they can get used >>> to left-to-right. >> Excuse me? How can anyone type the arabic, syriac or hebrew abjads from >> left to right? *That* would be really exotic, if ever possible! :-) > > How can anybody play guitar upside down or wrong-handed? But people do > it!!! ;-) > > kool m'i gnipyt siht sdrawkcab thgir won (ylwols) hehehe!!! However, your example exactly matches the hebrew case - Letters which don't visually connect to the next one. Therefore, it's just a matter of reading it in a proper way. In what comes to the arabic abjad, the visual aspect if trying to type it left-to-right is not even worth to discuss. (I can't resist it: playing the guitar upside down is just a matter of training and "wrong-handed" is just wrong if you don't shift the position of the strings and, of course, training - more on that, please check with Paul McCartney! :-))) > BTW, last I heard, Eli Z. was working on bidi editing in GNU Emacs. Hmmmm... I don't know Eli Z. nor GNU Emacs. Just a moment. Let me google it. (Sandwatch rolling) Oh, ok! Great! Interesting! However, I didn't find any mention to "BIDI", "arabic", "hebrew", "right", "left", etc. on his webpage. Perhaps BIDI is a work in progress, as you said. Mined has support for "poor man's BIDI" (Thomas Wolff's, the developer, own words). Arabic letters (for the arabic language) can have up to 4 different shapes, according to the position in a word (initial, medial, final) or if it is isolated (as on an acronym). On graphical environments, you only find the isolated shapes of the letters on the keyboard. However, as you type them, the operating system dynamically and continously replaces the shapes of the letters for the proper ones. Let me take the arabic word "qamar" (moon), for instance. For reasons not relevant to the scope of this conversation (and particularly concerning this word), "a" is not written, therefore we type "qmr". a) You type "qaf" (the arabic letter equivalent to our "q"). The screen displays the isolated shape of it. b) You don't press <space>; now you type "meem" (the arabic equivalent to our "m"). Since you hadn't pressed <space>, the operating system understands that "qaf" was the first letter of a word. It replaces its isolated shape for its initial shape. Well, there's another letter to come: "meem". There's already a letter in a initial position, therefore letter "meem" can only come on its final shape. End of word. c) You still don't press <space>; now you type "ra". Yes, their "r". Since once again you hadn't press <space>, the operating system understands that "meem" wasn't the last letter of the word, after all. It trades its shape from final to medial. Then, it displays "ra" on its final shape. End of word. (Again.) Well, FreeDOS doesn't provide this "Artificial Intelligence" feature, therefore each software must deal with it on its own. Mined's "poor man's BIDI" deals with it in this way: When you type arabic, the cursor stands still while the text keeps being pulled to the right, so that the following letter is positioned at the left side of the previous one. Besides, I also cannot count on Mined to dynamically change the shapes of the letters so I had to encode every shape of every letter directly into the keyboard. >> UTF-8 is best suited for languages written with the latin alphabet > > I just don't know if such a bias really is universally accepted or > not. As we've seen, it's not exactly "universal" which Unicode method > is preferred. I guess it matters less these days with Java being > ubiquitous and RAM being humongous. All I said is that UTF-8 is best suited for languages written with the latin (also cyrillic, greek, georgian, armenian) alphabet; in what comes to storage, texts based on those alphabets, when encoded as UTF-8 use many less bytes than when encoded as UCS-2 which, in turn, is best suited for CJK text because all CJK characters, under UTF-8, take 3 bytes each while under UCS-2 they take 2 bytes each. UCS-2 is also interesting in what concerns <Backspace> handling. It will always remove the last two bytes. Under UTF-8, since it is a variable-byte encoding, analysis must be done prior to deleting text. Tough job, I guess. >>>>> 4). Arabic (easy??) >>>> Unicode lists maybe 300 chars for that, at most. >> If we restrict ourselves to the arabic language, I can tell you that it >> is much less. > We don't need to support "everything", just enough for reasonable > functionality. "Reasonable functionality" is very relative. Well, much before worring about that, the point is that most Unicode's BMP blocks must be left behind on text mode anyway because of character shape's complexity. Well, back to what we perhaps could call reasonable functionality, I think on support for national official languages. Nowadays, using the arabic abjad, as national official languages we have arabic, persian, dari, urdu, pashto - and malay, which is (also) written with the arabic abjad in Brunei, where it is co-official script for that language. IBM and MS even provided a few codepages for arabic on those days, one for persian/dari - which are essentialy the same language - cp1098 and one for urdu (cp868). While I have not found any codepages for pashto and malay, I can easily devise one. Support for all other languages written with the arabic abjad could be left for a later stage. >> My conclusion: either there was a wholly tailored MS/IBM-DOS for India >> on those days or there were particular COM/EXE programs that would put >> any regular DOS on graphics mode so to handle ISCII. > > See Hindawi@FreeDOS. (Haven't checked, but it sounds like it uses > Allegro for gfx.) Thank you for the info! I'll definitely check that. >> Important to mention is that english is generally regarded as >> "pure-ASCII" but we must consider the fair amount of foreign words (like >> "café") and the need of accented/special chars used in middle and old >> english, therefore the english language (as much as german, french or >> any other latin-alphabet-based language) also falls in the same >> situation as portuguese. > Well, except that almost nobody puts accents on English words, even > loan words. At least I never do. "naive" and "cafe" have to suffice > for me. ;-) > > BTW, surely I'm not telling you anything you didn't already know, but > ... Old English is, erm, kinda like dead and old and 100% > incomprehensible and not used and stuff. (Beowulf?) :-)) Middle > English is just weird spelling and archaic words (Shakespeare? > Chaucer?), hence we're not exactly using it a lot either. ("Anon! > Forewith she shewn the waye!") Naturally, support for medieval and ancient languages and characters can be left (in my opinion) for a later stage. >> In what comes to storage (and UTF-8), russian needs the regular latin >> digits (1 byte each) and the cyrillic letters (2 bytes each char); if we >> think on cyrillic needs in general, then we also have the ukrainian >> hryvnia currency sign, a 3-byte char (again, "Currency Symbols", >> 2000h-206Fh). > I don't know why it isn't acceptable to just spell it out as "30 > hryvnia" instead of always having specific symbols for everything. Acceptable it is. However, whenever appropriate or for sake of practicity, we type "US$ 30" instead of "30 american dollars", "£30" instead of "30 sterling pounds", etc. Otherwise, we would have, for instance, financial magazines the size of Bibles. >>>>> own scripts are a problem, not to mention those like CJK that have >>>>> thousands of special characters. (e.g. Vietnamese won't fit into a >>>>> single code page, even.) >> Actually, it does. There was a standard called VISCII on the old days. >> It has been available for FreeDOS for a while already. The catch is: due >> to the hugh amount of necessary precomposed chars (134), there are no >> linedraw, shade, block or any other strictly non-vietnamese precomposed >> char on the upper half of VISCII and 6 less-used control chars on the >> lower half had their glyphs traded for the remaining 6 precomposed >> vietnamese accented latin letters. > I looked it up on Wikipedia a while back, and it had like three > different workarounds, all different but all logical enough (to me, at > the time). So maybe I'm worried over nothing. Maybe they can get along > fine without explicit support. Maybe we should let them come to us and > tell us how the hell to "fix" it! :-)) Too late. I prepared the vietnamese VISCII and keyboard layout for FreeDOS a long time ago, as a matter of fact. :) >>>> When you have Unicode, you do not need codepages. >>> Right. And when you have a 286 or 386, you don't need to limit to 1 MB >>> of RAM. ;-)) >> Furthermore, due to the number of glyphs (and the shape complexity of >> many of them), I can only imagine Unicode working on graphics mode and >> that will certainly complicates matters for very old computers... > For good or bad, it's long been assumed by most developers that > everybody has VGA or SVGA or newer. (With "modern" OSes, it's worse: > gfx acceleration, OpenGL, DirectX 9, etc.) Well then, let's go graphics! :) >> Unless it be considered some sort of "sub-Unicode" support for them, focusing >> only on latin, cyrillic, greek, armenian and georgian alphabets because >> their letters can easily fit on regular codepages and they cover the >> needs of the majority of world's languages. That could be the best >> possible workaround. > I'm not sure 8086 is really a feasible target anymore (though I'm not > actively suggesting dropping it). But do such retrocomputing people > even want Unicode support? I doubt it. Like you said, they're probably > happy enough (or even English only!). It seems you're thinking on a scenario where all retrocomputing people live in the USA (or UK, or Australia). Let me give you an example: many video BIOSes in computers here in Brazil, back on those days, used two distinct (both proprietary) national standards (Abicomp and BRASCII) in order to provide support for the portuguese language even on regular CGA or Hercules displays. Therefore, it is not a matter of wanting but a matter of needing Unicode support. While BRASCII was almost identical to ISO 8859-1, Abicomp was an encoding completely different from anything else. It gets worse: many cheque printers (if not all) here in Brazil use either one or the other of those two encodings to this day. They even come with DOS drivers! Naturally, when I talk about "need for Unicode" even for a simple language like portuguese (from the script's point-of-view), I'm considering an all-or-nothing scenario where there would be no regular codepages but only Unicode. Henrique ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Freedos-user mailing list Freedos-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-user