Zitat von Guy Fink <[email protected]>:
Hello
I have opened issue #0018144 in the bugtracker and uploaded a new
version of my codepages unit.
My description on this :
In September we had a discussion on the Lazarus-mailing list to
rewrite LConvEncoding and move the functionality to the RTL (Thread:
rewriting of LConvEncoding).
Since there I did a lot of coding to implement an effective
algorithm, both for Singlebyte- as for Doublebyte-Codepages. A first
release was on the mailing list mid-October, mainly as a base for
further discussions. But there were no comments or suggestions on
this.
So here is a nearly final release with many changes to the first version.
It does not compile under 2.4.2:
cp_ISO88591.pas(69,37) Error: Constant strings can't be longer than 255 chars
Major points:
- The unit supports Single- and Double-bytecodepages trough the
same functions
- Widestringsupport (configurable)
- UTF8 and UTF16 support (UTF16 needs widestrings)
Great.
- Direct conversion from CP to CP without intermediate string
Nice.
- Uppercase and Lowercase support
- Underlying Unicodes as of V 6.0.0 (October 11, 2010)
- A converter-application to convert Unicodedefinitions to a complete
pascal unit. The cp_* units are entirely generated by this app.
- Conversion up to 80% faster for SBCS.
Ehm, you made many functions inline. Even those that are more than a
few lines of code. This will enlarge the executables and can cost
performance in normal applications (e.g. Lazarus).
You call for each character a conversion function. But most real world
texts contain a big part of ASCII characters, where no conversion is
needed for UTF-8. My guess is that for most texts this approach is
slower. But I have to wait till it compiles before I can test.
- For DBCS up to 100 times
;)
As for now there are only units for ISO-8859-1, ISO-8859-2 and CP932
(SHIFT_JIS). More to be added for the final release. The
converter-subdir has all the definition files that I could find. I
will add them all.
The units:
codepages.pp : the main unit (highly configurable trough codepagesdef.inc)
unicodemappings.pas : Some definitions from unicode.org,
especially the tables
for uppercase, lowercase and the unicodeblocks.
utf8.pas : mainly the UTF8 functions from LCLProc + some new
utf16.pas: same for UTF16
acpinfo.pas: info for codepages supported by Windows, as published on MSDN
Some first test results as attachment.
Mattias
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus