Re: [Harbour] codepage and RDD

Szakáts Viktor Mon, 03 Nov 2008 14:05:28 -0800

Hi Przemek,

And here could come the advantage of being internally Unicode,
since in this case the CP doesn't need to be added next to
each string. And this is probably we should rather do, instead
of introducing CP information for each string. And just catch
the I/O points where such CP information is needed to know
how to interpret non-Unicode strings.


But how do you plan plan to resolve the problem with binary data?
What to do with FREAD()/FWRITE() buffers?
Using internally UNICODE for all string items resolves one problem
but creates many new ones for code which have to operate on binary
data.


I'm probably not able to see all the implications of binary
data, but to me, binary data is just a bunch of 0x00-0xFF bytes
and that's it (sorry for my simplistic and ignorant POV).
Usually you don't sort binary data, and you don't do UPPER()
on it, if someone does, it his problem. Anyhow, FRead()/FWrite()
are exactly to such I/O points where CP conversion may occur,
and we'd need some ways to control CPs at this point, even if
we'd only handle text. If we allow setting up a CP, one such
special CP could be 'raw data', meaning, there is no need to
touch/convert it at all, handle it as is. I'm not sure if this
fact needs to be attached to the strings to be used at later
times.

That of course could also mean that special field type needs
to be reserved for such binary data in .dbf (or other) tables,
to help handling that problem. [ Reading "normal" text fields
will automatically converted between DB CP and internal Unicode,
while we need some convenient ways to prevent that for some
special - binary - data. ]

And that instantly explains the reason why BLOB fields were
born. In Clipper this looked like superfluous stuff, and not
really necessary, since every field is "BLOB", even a simple
memo will do. But in case of such Unicode trickery, it's
starting to have a point.

So, generically speaking all these I/O points needs to be
resolved, taking into account CP/binary-raw flagging. Just
like we do now with GTs.

Even if we've agreed on the internals, of course we'd also
need to solve to keep everything Clipper compatible.

Anyhow the RDD codepage support seems also quite broken then,
and the final word is that currently Harbour doesn't really
support CPs, besides one per app selected for the RDDs plus
all the app string functions (comparison, UPPER()/LOWER()),
and there is a grade of support to convert strings to another
CP for GT output. Period.


Harbour support some limited automatic CP conversions in some
input/output operations and allows to dynamically switch collation
rules and ALPHAs, UPPERs, LOWER sets. Nothing more nothing less.


Yes, but that's attached to the RDD (== app) collation, which makes
it extremely dangerous to use unfortunately. Maybe we should
(or we may already do) have function to just do:
hb_strUpper/Lower( <string>, <CP> )
hb_strIsAlpha( <char>, <CP> )
[names tentative]

We seem to have the C level functions for these, but not the
.prg AFAIK.

[ And maybe I don't even need them, I'll reexamine my code
regarding these issues. I definitely need proper conversions
though (852 -> Win, Win -> 852, Win/852 -> utf8), it would be
much nicer to use Harbour for these tasks. ]

The translation in current native RDDs is in practice limited to
single country CPs. It allows to resolve problems introduced by many
historical CPs and I guess it was Alexander intention when he was
adding it. I'm finding it usable in my country where I have to leave
with ISO-8859-2, CP852, CP1250, Mazovia and few other CPs much more
seldom used.


Don't get me wrong, I'm not complaining about what we have
so far, and kudos for Alexander for all the work it took,
very nice job.

Just looking where to move forward, that's all.

This - even now - could cause all sort of problems, if a
3rd party lib tries to fiddle with hb_setcodepage().


Yes of course. But the same you can say about most of other SETs.


BTW, why isn't hb_setcodepage() a simple Set()? Shouldn't we
convert it to one? IMO we should. What do you think?

Brgds,
Viktor

_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] codepage and RDD

Reply via email to