Re: Ping JJ: string literals

JJ Merelo Sat, 18 Jan 2020 00:07:27 -0800

I know this is utterly and absolutely absurd, but so it goes.

El vie., 17 ene. 2020 a las 23:28, ToddAndMargo (<toddandma...@zoho.com>)
escribió:

> Hi JJ,
>
> Please be my hero.
>
> I won't call you any goofy names out of
> affection and friendship, as others will get
> their nickers in a twist.
>
> This is from a previous conversation we had concerning
> the mistake in
>
> https://docs.raku.org/language/nativecall#index-entry-nativecall
>
>      my $string = "FOO";
>      ...
>      my $array = CArray[uint8].new($string.encode.list);
>
>
> Todd:
>      By the way, "C String" REQUIRES a nul at the end:
>      an error in the NativeCall documentation.
>
> JJ:
>      No, it does not. And even if it did, it should better
>      go to the C, not Raku, documentation
>

A C string literal is a C string. It automatically gets null terminated. I
already mentioned this link
https://stackoverflow.com/questions/8202897/null-terminated-string-in-c,
but you don't seem to like and read links, so I copy the good bits here;

"Is it absolutely necessary? *No*, because when you call scanf, strcpy(except
for strncpy where you need to manually put zero if it exceeds the size), it
copies the null terminator for you. Is it good to do it anyways? *Not
really*, it doesn't really help the problem of bufferoverflow since those
function will go over the size of the buffer anyways. Then what's the best
way? use c++ with std::string."

And another answer:

"Always be careful to allocate enough memory with strings, compare the
effects of the following lines of code:

char s1[3] = "abc";char s2[4] = "abc";char s3[] = "abc";

All three are considered legal lines of code (
http://c-faq.com/ansi/nonstrings.htmlhttp://c-faq.com/ansi/nonstrings.html),
but in the first case, there isn't enough memory for the fourth
null-terminated character. s1 will not behave like a normal string, but s2
and s3 will. The compiler automatically count for s3, and you get four
bytes of allocated memory. If you try to write

"

>
>
> And that would be a "String Literal", which is NOT
> a C String.  And C's documentation is precise and
> clear (n1570).  It is not their problem.  It
> is a mistake in NativeCall's documentation.
>

Did you really read what you wrote? A string literal is not a C string? And
you want to add that to NativeCall documentation? You really seem to be
driven by your need to prove you're right, than by a genuine wish to
improve the documentation. Which I insist, is for Raku, not for C.

>
> Without the nul at the end, the string is considered
> "undefined".
>

No, it's not. Please read the answer in StackOverflow above. If you use
string literals and don't assign enough memory for the null termination,
it's going to be undefined. That's the case for s1 above. Again, C stuff.
There's enough work as it is now to document the finer points of Raku. Only
with the "new" ( = 1 year old) 6.d Raku behavior, there're still almost 100
items to document. Document the finer points of C is totally outside the
scope. Just read (maybe implicitly) at the beginning of NativeCall "Get
your C right" and that's it. I'm not gonna go further than that.

>
> The C guys have been helping me with definitions.  Chapter and
> verse would be :
>
> INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
> Programming languages — C
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
>
>      5.2.1 Character sets
>      7.1.1 Definitions of terms
>      6.7.9,p14 Semantics
>
> Here is your String Literal, which you are confusing
> with a c string:
>
> 6.7.9,p14 states:
>
>      An array of character type may be initialized by a
>      character string literal or UTF−8 string literal,
>      optionally enclosed in braces. Successive bytes
>      of the string literal (including the terminating
>      null character if there is room or if the array
>      is of unknown size) initialize the elements of
>      the array.
>
> So there is the unnecessary nul you speak of, except,
> again, it is not a C String.  So you do have a
>

Are you _really_ saying we should put parts of the C standard in the Raku
documentation? Again, did you read what you wrote?

somewhat of a point, although not a good one as
> it will throw those trying to use the docs into
> a state of confusion wondering what is going wrong,
> as it did me.
>

They should have read the (implicit) notice at the beginning of NativeCall:
"Get your C right"

> It about killed me to figure it our myself.  I
> don't want others to go through the same pain
> over a simple mistake in the documentation.
>

Well, an omission has suddenly be converted into a mistake. And an example
which is totally correct (as proven several times) too. We do live in
curious times. (Also, I didn't write that)

> Now the C guys told me that the reason why I am not
> getting anywhere with you is that I provided a bad
> example.  They proceeded to give me an example
> that precisely shows the careening make by
> the mistake in the documentation:
>
>
> An example of an unterminated C string:
>
>
Well, this example is even worse.

On 2020-01-17 13:21, Bart wrote:
> >
> > <t2.c>
> > #include <stdio.h>
> >
> > void foo(const char *s)
> > {
> >     for (int i=0; i<10; ++i)
> >         printf("%d ",*s++);
> >     puts("");
> >
> > }
> >
> > int main(void)
> > {
> >      char str[3] = "FOO";
>

Please read the StackOverflow post above (which I have copied, you don't
even need to click on the link, don't worry). The behavior of that string
is ambiguous _because you didn't allocate enough space for the string_. Not
because it's not null-terminated.

>      foo(str);
> > }
> > </t2.c>
>
>
> To compile:
>       gcc -o t2 t2.c
>
> To Run
>       t2
>
>
> > This prints the 3 characters codes of F,O,O in the string, plus 7
> > bytes that follow. Results on various compilers are as follows:
> >
>

Ambiguous means it will behave in some way some times, some other times in
others. So:

> bcc:         70 79 79 0 0 0 0 0 0 0
>
                                      ^ Here's your null termination

> tcc:         70 79 79 80 -1 18 0 0 0 0
> > gcc -O0:     70 79 79 112 -14 48 0 0 0 0
> > gcc -O3:     70 79 79 2 0 0 0 0 0 0
> > lcc:         70 79 79 0 0 0 0 0 0 0
>
                                    ^ Here's your null termination

> dmc:         70 79 79 0 -120 -1 24 0 25 33
>
                                       ^What's this? Oh, it's null
termination all over again

> clang -O0:   70 79 79 16 6 64 0 0 0 0
> > clang -O2:   70 79 79 0 48 -120 73 6 -19 127
>

                                          ^ O2 is smart enough that it
null-terminates it

> msvc:        70 79 79 -29 -9 127 0 0 0 0
> >
>

Just change that to      char str[4] = "FOO"; and it's going to be
perfectly fine.
Documentation is an arcane art, you know. Means that not only you have to
be inclusive (if you need to), but totally precise. If I had included that
example in the NativeCall documentation (which, let me be clear, I will
not), it wouldn't have been an example for the need to null-terminate
strings (which literals do automatically for you) but to get the size of
the strings right.
Hey, but I don't need that, because that's already implicit in the subtitle
of NativeCall:

                                                      "Get you C right."

> The 70, 79, 79 are the F, O and O codes. When those 3 happen to
> > be followed by 0, then it will appear to work.
> >
> > That is 4 out of 9, but the other 5 won't work. What follows
> > after FOO is undefined and could be anything, although a random 0 is
> common.
> >
>
> JJ!  He ran it through NINE C compilers!  The careening
> is OBVIOUS!
>

What is obvious is the ambiguous behavior of strings whose size is not
declared correctly.

> And this mistake is very easy to fix:
>
> Change
>     my $array = CArray[uint8].new($string.encode.list);
> to
>     my $array = CArray[uint8].new($string.encode.list, 0);
>
> THREE characters `, 0` and it is fixed!  And you are
>

The example works perfectly, and it does because it's a string literal
which is already 0 terminated. Let's use this code instead of the one that
I used in my other mail about this (which you probably didn't read anyway):

```
#include <stdio.h>

void set_foo ( const char *foo) {
  printf("Printed directly %s\n", foo);
  for (int i=0; i<5; ++i)
    printf("%d ",*foo++);
}
```

The Raku part will be the same:
```
use NativeCall;

my $string = "FOO";
my $array = CArray[uint8].new($string.encode.list);
say $array.elems;
sub set_foo(CArray[uint8]) is native('const-char') { * }
set_foo( $array );
```

This prints:
3
Printed directly FOO
70 79 79 0 0
```

What does this mean? It means that NativeCall does the right call
(badum-tssss) and converts a Raku string literal into a C string literal,
inserting the null termination even if we didn't. I actually don't care if
it was the NativeCall API or the encode method. It just works. It gets
allocated the right amount of memory, it gets passed correctly into the C
realm. Just works. Since @array.elems has 3 elements, well, it might be
rather the C part the one that does that. But I really don't care, and it
does not really matter, and thus the example is correct, no need to add
anything else to the documentation. Except maybe "get your C right"

are finally conforming to n1570.  And you will be
> my ever living hero!  (Watch some take offense to that!)
>
> Have I still not convinced you?  THREE CHARACTERS !!!!
>
> Sorry for being such a pest about this.  It about killed
> me to figure out.
>
> Please be my hero.
>

Again, there's nothing to change in the documentation which besides, for
starters, was a simple seudo-code which didn't compile anyway. But you can
check out my "keepers", JJ/my-perl6-examples for those examples above, and
many more.

Cheers

JJ

Re: Ping JJ: string literals

Reply via email to