Re: [Harbour] BYTE -> UCHAR patch

Viktor Szakáts Sat, 07 Feb 2009 07:44:33 -0800

Hi Przemek,


> I suggest to change all 'BYTE *' used as file name in Harbour
> FS API and similar functions to 'char *' type to not replicate
> the problems which comes from Clipper, f.e. from CL53 header files:
>   typedef unsigned char BYTE;
>   typedef BYTE far * BYTEP;
>   extern void _retc(BYTEP);
>   extern void _retclen(char far *, unsigned int);
> You may find similar problems also in CL52.
> It forces explicit casting in C++ mode if char is signed. Such casting
> has bad side effect. It can hide typos or even errors normally easy to
> catch at compile time, f.e. pFile wrongly used instead of szFile:
>   PHB_ITEM pFile = hb_param( 1, HB_IT_STRING );
>   char * szFile = hb_itemGetCPtr( pFile )
>   HB_FHANDLE hFile = hb_fsOpen( ( BYTE * ) pFile, ... );


Yes, I see your point.

I'll try to fix that before committing anything, but first
I have some questions.

Am I right assuming that we will definitely take the UTF-8
route then (for filenames, too), and in this case 'char' may
just hold UTF-8 strings in the future? (had we choose UTF16,
we may need to centrally redefine 'char' to double bytes in
the future, so such abstract type might have its benefits.)


> I suggest to unify all places which uses text strings like file
> names to use 'char *'. It allow us to remove unnecessary casting
> (char*)<->(BYTE*) and it should be removed with this modification.
> But we use BYTE* type also for arrays of unsigned 8-bits integers,
> f.e. PCODE, RDD file structures or functions which makes some
> calculations on memory blocks using them as array of 8-bits unsigned
> integers like most of check sum functions. Here BYTE should be replaced
> by UCHAR at least internally. Now Harbour cannot work on platforms where
> BYTE is already defined as signed char, f.e. this HVM code will fail:


Indeed, this was a grey area (BYTEs used in checksum, pcode,
and the like) I intentionally didn't touch. I think I may convert these
a second pass just to be on the safe side.


>         case HB_P_SENDSHORT:
>            hb_itemSetNil( hb_stackReturnItem() );
>            hb_vmSend( pCode[ w + 1 ] );
> because pCode[ w + 1 ] after casting to USHORT will give wrong result
> for value greater then 128. F.e. for 255 we will have 65535.
> We should fix it. Here simple replacing BYTE by UCHAR in PCODE array
> definition is enough though really clean version should make sth like:
>         case HB_P_SENDSHORT:
>            hb_itemSetNil( hb_stackReturnItem() );
>            hb_vmSend( HB_PCODE_MKUCHAR( &pCode[ w + 1 ] ) );
> but cleaning code to work with PCODE stored in signed or unsigned
> bytes so we can keep it in array of BYTEs needs much more work.
> In few places when programmer wants to use single byte as signed 8bit
> integer then he should use it as 'signed char' or 'SCHAR' because
> he cannot assume that 'char' will be signed or unsigned on destination
> platform/C compiler. I fixed all such places in core code when I was
> working on Harbour port to mips390 CPUs but I want other developers
> remember about it creating new code or updating existing one.
>
> In your modifications you replaced BYTE * used as file name in Harbour
> FS API to UCHAR *. File names are text strings and  I want to use
> simple char * for them and I want to change BYTE used as synonym
> of 8bit unsigned integer to UCHAR.


Is this true for GT strings, too? (to convert them to char, not UCHAR)


> Sorry my fault, I should be more clear earlier and well explain
> the problem.


No problem. My assumption was based on current usage, plus the
idea that we're marking everything as UCHAR what is meant as 'text'
in Harbour (as opposed to raw binary data).

Q1: Having Unicode support in mind, shouldn't we use some distinct
marking for 'text' (char *) data?

Q2: May I change to UCHAR / SCHAR to HB_UCHAR / HB_SCHAR,
as part of moving our special types to our own namespace?
(UCHAR is an Microsoft type name, also)

Q3: Can we define the rules for different string/char types, so that
everyone speaks the same language here:

- HB_UCHAR, HB_SCHAR: ... ?
- char / HB_TEXT?: Harbour character/string (with future UTF-8 support) ?
- BYTE: ... Harbour raw binary data or other simple numeric BYTE ?

Maybe that would help for everyone to see more clearly.

Brgds,
Viktor

_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] BYTE -> UCHAR patch

Reply via email to