Hi Matej, thanks for your research :-)
> It seems that there is just one special API function for double byte > character sets, 6300h: > > http://www.ctyme.com/intr/rb-3142.htm > http://www.ctyme.com/intr/rb-3143.htm > > It returns a table of ranges of valid DBCS leading bytes. This allows > applications to detect that it is reading DBCS characters as opposed to > ASCII or JIS X 0201 (an 8-bit encoding with ASCII in the lower half and > katakana in the upper half). > > An application then simply uses standard DOS functions for everything, for > example INT 21h/AH=1 for input and INT 21h/AH=2 for output. DOS of course > does not supply any special string functions, so it is up to the app... ... thanks for the example code ... > So, what needs to be done? > > 1. INT 21h/AX=6300h has to be implemented. If that just returns a charset-specific static table, maybe it would be some sort of charset rendering and keyboard / input method driver that actually implements this, not the kernel? > 2. INT 21h/AH=1 (and all other input functions) has to be modified so that > if a double byte character is entered, it returns the first byte and > remembers the second byte to return it in the next call. I could imagine that this can also be done in the keyboard driver, similar to what the BIOS does with function keys which also have no ASCII equivalent and still use the BIOS keyboard buffer I/O like everything else... > 3. INT 21h/AH=2 (and all other output functions) has to be modified so > that if it detects a leading byte of a double byte character, it has to > remember it and wait until the next call, when it gets the second byte, to > print the character. Well, DOS itself cannot print in charsets beyond 1-byte-per- char, because it uses the BIOS functions which in turn use the VGA hardware which cannot have more than 2 x 256 chars sized fonts. So this again sounds like a job for a DRIVER, one which uses graphics mode to render extensive fonts. We already have support for Unicode fonts in a few graphical DOS text editors and similar (thanks :-)) and whether DBCS or UTF-8 is used, both share the "size of character can be one or more bytes" handling "anomaly". I see your point in avoiding to print "half double bytes", but because the graphical output is done externally to the kernel anyway, the disadvantages of DBCS-agnosticism for int 21 function 2 and similar seem limited: The graphical font driver would just remember having seen half of a DBCS itself and draw the actual character as soon as it receives the second byte of that. > 4. A keyboard layout has to be made. I have no idea how keyboard > layouts work in DOS, so I can't say much. Well layouts are one thing, but for beyond-alphabetic DBCS input, you probably need an input method driver, which is separate from the layout for ASCII. Normally that works by typing short ASCII sequences, typically in a special shift state, to select a Chinese / Japanese / Korean character. I assume that such drivers are separately available, also in free versions, working with any DBCS-enabled DOS system. Imagine that for example you type Strg-K-A-N-X and when you release the Strg again, the input method sends two bytes, in other words one DBCS, through the DOS console driver, saying "somebody has typed the character named Kanji-Xen" (I invented that character). So there is not one KEY that lets you type one "Xen" character, but a WAY to type one. > 5. A font has to be made. Perhaps the GNU Unifont could be converted? The abovementioned editors use TrueType Fonts (TTF) as far as I remember, but in a fixed size way, I believe. So that conversion step is something people have experience with :) > 6. Probably the hardest part: all FreeDOS packages, or at least basic ones > (FreeCOM, FIND, SORT, EDIT), have to be updated to support double byte > characters. See above for editing. And importantly, note how much of DOS (and tools) do NOT have to know about the DBCS nature of text: It makes no difference for FIND if you search for a four letter word or for four bytes which MEAN two DBCS characters. Among other things, this is thanks to having a lead byte / next byte distinction and having no upper or lower case in CJK languages if I remember correctly. For the same reason, FreeCOM does not have to care. File names cannot be more than 8 + 3 BYTES long, but if those 8 bytes happen to have 4 DBCS chars as content, it is the same to FreeCOM. Only the display driver has to graphically draw the 4 DBCS chars for you instead of 8 ASCII ones then. Again, the lead byte trick allows the driver to recognize whether an incoming byte is ASCII or part of a DBCS. Note that the 2nd half of a DBCS cannot be distinguished from ASCII if I guess correctly, so the graphical font display driver has to remember whether the previous char was a non printing DBCS lead byte (and which) or not, that's enough. SORT is a different story, but I do not know whether DOS is supposed to include Japanese etc aware SORT or whether that is normally part of a separately available package of JKC tools. Also, what license do such packages have? > PS: I just wrote all that and found this: > > http://nokonoko365.cocolog-nifty.com/blogfile/freedos/index.html > > Is that third party software for Japanese support or what? I do not know. Note that most of my mail above are educated guesses. I hope they still help inspiring this discussion. Regards, Eric ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Freedos-user mailing list Freedos-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-user