Deepak Kotian wrote: > When should one use wchar and when should one use char. I have heard > lot about UNICODE. But what is the real need of wchar_t type ?
The first thing to understand is that Unicode can be encoded in a few different ways (and for the most part, it is easy to transform one encoding into another by bit shifting). Fundamentally, Unicode consists of 32-bit characters. But you really don't want to encode all your text files that way, because it will make them four times larger (obviously). Therefore, there are some clever encodings that allow you to encode the full range of 32-bit characters in variable-length sequences of 8-bit or 16-bit values. There is also a subset of Unicode called UCS-2 which, IIRC, simply limits the character size to 16 bits and cannot express Unicode characters that require more than that. UCS-2 is the native text encoding in both Java and Windows NT. So, now to finally answer your question, you use char for character encodings based on 8-bit values (either singleton, as in traditional ASCII, or variable-length sequences). Unicode's UTF-8 encoding is one example of such an encoding. You use wchar_t, on the other hand, for encodings based on 16-bit values. Unicode's UCS-2 and UTF-16 encodings are of this type. > Actually, I have many read many files through a C program and it has > Japanese text as well. What is advisable to use. Should it be fgetws > or fgets, will also do. When should one use fgetws or fgets and > reason. That depends how the text is to be encoded. > Please let me know, if someome can explain in simple terms or any > document on this would be helpful. See http://www.unicode.org. Lots of documentation there. > Moreover, windows has wsystem(), wstat(),etc, which LINUX does not > have, any reasons for that. Microsoft extends the C standard library with wide-character equivalents for most functions that take or return strings. This is intended to make it more convenient to write native Windows NT programs that always use wchar_t for text. Linux itself doesn't have any C functions; it's just the kernel. You're probably thinking of the GNU C library. I have no idea what the GNU C library's Unicode support is like. If it lacks those functions, it's probably because they are non-standard and there has been no great demand for them from users of that library. Craig
pgp3ddrrZnRLX.pgp
Description: PGP signature