Ping JJ: string literals

ToddAndMargo via perl6-users Fri, 17 Jan 2020 14:29:33 -0800

Hi JJ,

Please be my hero.


I won't call you any goofy names out of
affection and friendship, as others will get
their nickers in a twist.

This is from a previous conversation we had concerning
the mistake in

https://docs.raku.org/language/nativecall#index-entry-nativecall

    my $string = "FOO";
    ...
    my $array = CArray[uint8].new($string.encode.list);


Todd:
    By the way, "C String" REQUIRES a nul at the end:
    an error in the NativeCall documentation.

JJ:
    No, it does not. And even if it did, it should better
    go to the C, not Raku, documentation


And that would be a "String Literal", which is NOT
a C String.  And C's documentation is precise and
clear (n1570).  It is not their problem.  It
is a mistake in NativeCall's documentation.

Without the nul at the end, the string is considered
"undefined".

The C guys have been helping me with definitions.  Chapter and
verse would be :

INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
Programming languages — C
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    5.2.1 Character sets
    7.1.1 Definitions of terms
    6.7.9,p14 Semantics

Here is your String Literal, which you are confusing
with a c string:

6.7.9,p14 states:

    An array of character type may be initialized by a
    character string literal or UTF−8 string literal,
    optionally enclosed in braces. Successive bytes
    of the string literal (including the terminating
    null character if there is room or if the array
    is of unknown size) initialize the elements of
    the array.

So there is the unnecessary nul you speak of, except,
again, it is not a C String.  So you do have a
somewhat of a point, although not a good one as
it will throw those trying to use the docs into
a state of confusion wondering what is going wrong,
as it did me.

It about killed me to figure it our myself.  I
don't want others to go through the same pain
over a simple mistake in the documentation.

Now the C guys told me that the reason why I am not
getting anywhere with you is that I provided a bad
example.  They proceeded to give me an example
that precisely shows the careening make by
the mistake in the documentation:


An example of an unterminated C string:

On 2020-01-17 13:21, Bart wrote:


<t2.c>
#include <stdio.h>

void foo(const char *s)
{
    for (int i=0; i<10; ++i)
        printf("%d ",*s++);
    puts("");

}

int main(void)
{
     char str[3] = "FOO";
     foo(str);
}
</t2.c>



To compile:
     gcc -o t2 t2.c

To Run
     t2

This prints the 3 characters codes of F,O,O in the string, plus 7bytes that follow. Results on various compilers are as follows:
bcc:         70 79 79 0 0 0 0 0 0 0
tcc:         70 79 79 80 -1 18 0 0 0 0
gcc -O0:     70 79 79 112 -14 48 0 0 0 0
gcc -O3:     70 79 79 2 0 0 0 0 0 0
lcc:         70 79 79 0 0 0 0 0 0 0
dmc:         70 79 79 0 -120 -1 24 0 25 33
clang -O0:   70 79 79 16 6 64 0 0 0 0
clang -O2:   70 79 79 0 48 -120 73 6 -19 127
msvc:        70 79 79 -29 -9 127 0 0 0 0
The 70, 79, 79 are the F, O and O codes. When those 3 happen tobe followed by 0, then it will appear to work.
That is 4 out of 9, but the other 5 won't work. What followsafter FOO is undefined and could be anything, although a random 0 is common.


JJ!  He ran it through NINE C compilers!  The careening
is OBVIOUS!

And this mistake is very easy to fix:

Change
   my $array = CArray[uint8].new($string.encode.list);
to
   my $array = CArray[uint8].new($string.encode.list, 0);

THREE characters `, 0` and it is fixed!  And you are
are finally conforming to n1570.  And you will be
my ever living hero!  (Watch some take offense to that!)

Have I still not convinced you?  THREE CHARACTERS !!!!

Sorry for being such a pest about this.  It about killed
me to figure out.

Please be my hero.

-T

Retrievers are better looking than Labs.  (I can't
wait for the hate mail over that!)

Ping JJ: string literals

Reply via email to