Re: [Harbour] 2008-11-03 11:20 UTC+0200 Viktor Szakats (harbour.01 syenar hu)

Szakáts Viktor Tue, 04 Nov 2008 01:53:45 -0800

the named code page ESMWIN is the correct to use at ansi spanish
applications. all other are invalids to work or only to read old buggy
indexes incorrect ordered.


"ANSI" means Windows in this context.

ESMWIN is based on:

Windows locale:      Spanish (Modern Sort)
Locale ID:           0xC0A
Location designator: Modern_Spanish
Code page:           1252
Code page name:      windows-1252

you can read the next tables and see how incorrect is to use ISO-????.
CERTANLLY ESMWIN needs to change its code page name to windows-1252


I'll correct the CP in those files then, thank you.
I still fail to see however, how cpesisom is any
worse than cpeswinm (or mwin), if their content is
exactly the same, just linked to the ISO CP (which
is again the same as Windows-1252 CP, except the name).
Please explain.

[ Update: I became clear later. ]

ESWIN is a ansi copy of its oem version es850 compatible with oldcollation used in CL53
but incorrectly ordered an unusable in Spain for this reason.


Okay, so they probably need to be called ESWINC, ES850C and
ESISOC to say, these are indeed Clipper compatibility codepages,
and at the same time rename to rename 'Modern' ES*M to ESWIN,
ESISO, because these seem to be the correct Spanish collations,
which everyone like to use. Thanks for the clarification.

If that's the case, I'd opt to clear this up in the repo.

It may also be useful to add an ES850 according to the
'Modern' collation, so that Spanish users can still use
HB_TRANSLATE() to convert strings between "OEM" and "ANSI"
codepages.

I don't know enough Linux, but the next is the windows information:

1) first of all this files are not code page, are collations.


Yes, I know, but they are using some sort of codepage anyway.
This codepage is indicated in various places in the files, plus
the collation strings are expected to be encoded in this CP.

2) theorically DBF are created to contain oem code page. Butactually the
  great part of windows applications save ansi code page at this files
  without problem or conflict.


I think the CP used in a given app is completely up to the
actual app / user. DBF itself can store any binary data,
even UTF16 or UTF8. The developers' job is to choose one
and use it consistently and sync the app settings to match
the used .dbfs.

3) there are different code page OEM, ANSI, UTF8( UNICODE ), ... allof
  them has its different collate.


Actually the collation (= ordering of national alphabet)
is one and only for a given language (at least theoretically).
Their internal representation (actual bytes used to represent
national chars) depends on the CP used. Notice that multiple
collations _may_ still exist for various reasons, the major
one being Clipper compatibility, or compatibility with other
products which weren't closely following the national standard.
(as discussed above). Another reason can be different "official"
and "unofficial" standards (and who knows what else).

It's important to keep the collation the exact same across
different CPs, otherwise HB_TRANSLATE() won't be able to
convert between them.

All in all, the above is the reason why we have multiple
cpes*.c files: same collation, different CP. Or, in case of
HU, there are two sets of collations (one Clipper compatible,
one Successware Six compatible), both with multiple CPs.
And this seems to be the case for Spanish, too.

Also, instead of "OEM" and "ANSI", - which are pretty vague,
Windows specific and non-standard terms - I'd suggest to use
'IBM/MS-DOS codepage' (~ "OEM") and 'Windows codepage' (~ "ANSI")
terms.

Code page Description
1258 Vietnamese
1257 Baltic
1256 Arabic
1255 Hebrew
1254 Turkish
1253 Greek
1252 Latin1 (ANSI)
1251 Cyrillic
1250 Central European
950 Chinese (Traditional)
949 Korean
936 Chinese (Simplified)
932 Japanese
874 Thai
850 Multilingual (MS-DOS Latin1)
437 MS-DOS U.S. English


See these in uc*.c files. I've added support for
all of them except the 900 range (which doesn't
seem to fit in current Harbour).

'CodePage identifier and name BrDisp BrSave MNDispMNSave 1-Byte ReadOnly'65006 utf-32BE False False FalseFalse False True '


We should use the 'identifier and name', I'm not sure
though if they are all the correct ones in this list.
There is a 'main' name most of the time and lots of
aliases for the same CP, we should try to use the main
one.

Windows locale  LCID (locale ID) Collation designator Code page


This falls outside of Harbour scope, as it's Windows-only.
But the list is good to get an idea on which Language uses
what CP.

Thanks for clarifying the ES issues.

To me it looks like we'd need to do three things:

1) Fix the internal CP in current ESWIN and ESWINM files.
   (I'll do this ASAP)
2) Rename Clipper compatibility ESWIN, ESISO, ES850 files
   to ESWINC, ESISOC, ES850C and add comment that these
   are 'legacy' or Clipper compatibility ones (someone
   should test which is true). Rename ESWINM and ESISOM
   to ESWIN, ESISO.
3) Add ES850 according to the 'modern' collation.
+1) Think about a way to 'map' old ESMWIN name to new ESWIN.

Brgds,
Viktor

_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] 2008-11-03 11:20 UTC+0200 Viktor Szakats (harbour.01 syenar hu)

Reply via email to