> On Tue, 1 Jul 2008 10:33:28 +0200 (CEST) > > > all platforms? > > > > My proposition was: Two encodings, two stringtypes for all. > > Both at the same time?
Yes, utf8string and utf16string. Whatever Tiburon introduces aliased to utf16string, so that will be compat on non-windows too. And the utf16 tiburon code can easily communicate with the outside world. > > Florian's stand was thinking about one stringtype that supports both > > encodings. I don't like this, but we can only discuss that if Florian > > has more details about his ideas. > > I think, Marc had a similar idea. Adding an encoding field (e.g. in > front of the length). But IMO it has some drawbacks. Yes. Any manual string handling, that already gets more difficult, gets more expensive. Also because array dereference (which ignores surrogates, but is still a baseblock for string routine implementation) becomes expensive, or needs to be done with pointers. > > It will on every communication with the external world. IOW all my db > > exports will generally be UTF-8 on Unix and UTf-16 on Windows. > > Maybe you misunderstood me here. This section is about multiple encoding > proposal. So I was proposing to use only one string type in > RTL/FCL. > It can be a different one for each platform. Ok. That is somewhat different. One size fits all (UTF-16 everywhere) is not an option for me. It's the way of the least resistance, but is more for languages that have an ivory tower concept and want to keep the real world at arms length. So then different platforms, different encodings. Actually that was my first thought/proposal too, but that precludes any possible solution for Tiburon compability before we even start, and introduce a portability barrier. (want to recompile for linux ? First fix all your UTF16 string routines so that they support UTF-8 under ifdef. That is a hard sale) IMHO that is no long term sustainable situation, so which is why I changed to the two stringtypes solution. That has some disadvantages too, most notably adding even more string types and possible auto-conversion pitfalls. But I think it is an experiment that should at least have been tried. Note that this is totally separate from what Lazarus should do. Lazarus can IMHO happily use the UTF16 string type exclusively. I'm concerned with the base system. > As long as almost everywhere only one string is used no conversion can > take place and you can therefore store UTF8 in widestrings or UTF-16 in > strings or whatever binary data. It still requires manual conversion at the borders (any input or output to system, libraries,disk). But a lot less since only sources in an encoding "foreign" to the system need manually conversion code inserted. > Just as it is at the moment. Strings are not only text. I think this > concept is very important in pascal and breaking this will create a bigger > incompatibility than Codegear does with it string to widestring move. ??? > > See above. If we have to support two totally different OS api's (A > > and W) they are two different targets. Period. > > > > This also avoids the mess of changing all windows routines to be > > dynloaded, and hopefully lessen the mutual breaking a bit. > > Two different windows targets. Wow, a big step. Yes, but longterm unavoidable IMHO, to avoid the situation we had with Dos in years past, where the port is always trailing the Tier 1 ports. (though Giulio and Tomas managed to keep it working again I saw, but only after releases of it were postponed) W9x support is being dropped on all sides. However for me that is not necesary if we split the stuff now, while the w9x support is still qualitively ok. Even though w9x and NT are both windows, in some ways they differ more than e.g. FreeBSD and Linux. Doing the split before major NT requiring changes (read:unicode, but also e.g. symlink support?) will make the change more evolutionary, and the branching from a moment where the codebase is still proven to work on w32 will assure that it will have decent quality for quite some time. In the long term it will also save a lot of work, like crazy attempts tomaintain the status quo with insane workarounds like dynloading all api routines etc. _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal