Re: [cfe-users] how clang merge strings in .rodata section

2018-07-06 Thread Hans Wennborg via cfe-users
On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users
 wrote:
> Hi,
>
> The following c source code abc.c:
>
> #include 
>
> int g_val=10;
>
> const char *g_str="abc";
>
> const char *g_str1="c";
>
> int main(void)
>
> {
>
> printf("%s %s: %d\n",g_str,g_str1,g_val);
>
> return 0;
>
> }
>
>
>
> When compile with “clang abc.c -o abc” then dump .rodata section:
>
> # readelf -p .rodata abc
>
>
>
> String dump of section '.rodata':
>
>   [ 0]  abc
>
>  [ 4]  %s %s: %d
>
>
>
> When compile with “gcc abc.c -o abc” then dump .rodata section:
>
> $ readelf -p .rodata abc
>
>
>
> String dump of section '.rodata':
>
>   [10]  abc
>
>   [14]  c
>
>   [16]  %s %s: %d^J
>
>
>
> clang is able to merge short string (“c”) into the tail of a long string
> (“abc”), while gcc will not.
>
> Does anybody know how to disable this behavior (make it similar to gcc) ?

I don't think there is a way to disable it.

Why do you want to disable this behaviour?

 - Hans
___
cfe-users mailing list
cfe-users@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users


Re: [cfe-users] how clang merge strings in .rodata section

2018-07-06 Thread Jian, Xu via cfe-users
Hi Hans,
We need to compare whether ELF files of two builds are identical.
Because of string merge, the comparison has some trouble.

For example in case following code lines (may be in different files):
---
const char* s_array[1]="s";
const char *first_s="this first bigger s";
const char *second_s="this second bigger s";
---

After clang build ELF out, sometimes the s_array[1] contail the position of the 
tail of first_s in .rodata second, while sometimes second_s.
This lead to .data section diff since s_array is in it.
The ELF diffs, while nothing changed from functionality point of view.

Thanks.

-Original Message-
From: hwennb...@google.com [mailto:hwennb...@google.com] On Behalf Of Hans 
Wennborg
Sent: Friday, July 6, 2018 3:54 PM
To: Jian, Xu
Cc: cfe-users@lists.llvm.org
Subject: Re: [cfe-users] how clang merge strings in .rodata section

On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users 
 wrote:
> Hi,
>
> The following c source code abc.c:
>
> #include 
>
> int g_val=10;
>
> const char *g_str="abc";
>
> const char *g_str1="c";
>
> int main(void)
>
> {
>
> printf("%s %s: %d\n",g_str,g_str1,g_val);
>
> return 0;
>
> }
>
>
>
> When compile with “clang abc.c -o abc” then dump .rodata section:
>
> # readelf -p .rodata abc
>
>
>
> String dump of section '.rodata':
>
>   [ 0]  abc
>
>  [ 4]  %s %s: %d
>
>
>
> When compile with “gcc abc.c -o abc” then dump .rodata section:
>
> $ readelf -p .rodata abc
>
>
>
> String dump of section '.rodata':
>
>   [10]  abc
>
>   [14]  c
>
>   [16]  %s %s: %d^J
>
>
>
> clang is able to merge short string (“c”) into the tail of a long 
> string (“abc”), while gcc will not.
>
> Does anybody know how to disable this behavior (make it similar to gcc) ?

I don't think there is a way to disable it.

Why do you want to disable this behaviour?

 - Hans
___
cfe-users mailing list
cfe-users@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users


Re: [cfe-users] how clang merge strings in .rodata section

2018-07-06 Thread Hans Wennborg via cfe-users
On Fri, Jul 6, 2018 at 10:22 AM, Jian, Xu  wrote:
> Hi Hans,
> We need to compare whether ELF files of two builds are identical.
> Because of string merge, the comparison has some trouble.
>
> For example in case following code lines (may be in different files):
> ---
> const char* s_array[1]="s";
> const char *first_s="this first bigger s";
> const char *second_s="this second bigger s";
> ---
>
> After clang build ELF out, sometimes the s_array[1] contail the position of 
> the tail of first_s in .rodata second, while sometimes second_s.
> This lead to .data section diff since s_array is in it.
> The ELF diffs, while nothing changed from functionality point of view.

Did the inputs change? If Clang is sometimes using the tail of first_s
and sometimes second_s, for the same input, that's a bug. The
compilation should be deterministic.

Can you provide sample input files and command lines that show this problem?

Thanks,
Hans


> -Original Message-
> From: hwennb...@google.com [mailto:hwennb...@google.com] On Behalf Of Hans 
> Wennborg
> Sent: Friday, July 6, 2018 3:54 PM
> To: Jian, Xu
> Cc: cfe-users@lists.llvm.org
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
>
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users 
>  wrote:
>> Hi,
>>
>> The following c source code abc.c:
>>
>> #include 
>>
>> int g_val=10;
>>
>> const char *g_str="abc";
>>
>> const char *g_str1="c";
>>
>> int main(void)
>>
>> {
>>
>> printf("%s %s: %d\n",g_str,g_str1,g_val);
>>
>> return 0;
>>
>> }
>>
>>
>>
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>>
>> # readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>>   [ 0]  abc
>>
>>  [ 4]  %s %s: %d
>>
>>
>>
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>>
>> $ readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>>   [10]  abc
>>
>>   [14]  c
>>
>>   [16]  %s %s: %d^J
>>
>>
>>
>> clang is able to merge short string (“c”) into the tail of a long
>> string (“abc”), while gcc will not.
>>
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
>
> I don't think there is a way to disable it.
>
> Why do you want to disable this behaviour?
>
>  - Hans
___
cfe-users mailing list
cfe-users@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users


Re: [cfe-users] how clang merge strings in .rodata section

2018-07-06 Thread Matthew Fernandez via cfe-users


> On Jul 6, 2018, at 12:00, via cfe-users  wrote:
> 
> Message: 1
> Date: Fri, 6 Jul 2018 09:54:05 +0200
> From: Hans Wennborg via cfe-users 
> To: "Jian, Xu" 
> Cc: "cfe-users@lists.llvm.org" 
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
>   
> Content-Type: text/plain; charset="UTF-8"
> 
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users
>  wrote:
>> Hi,
>> 
>> The following c source code abc.c:
>> 
>> #include 
>> 
>> int g_val=10;
>> 
>> const char *g_str="abc";
>> 
>> const char *g_str1="c";
>> 
>> int main(void)
>> 
>> {
>> 
>>printf("%s %s: %d\n",g_str,g_str1,g_val);
>> 
>>return 0;
>> 
>> }
>> 
>> 
>> 
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>> 
>> # readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [ 0]  abc
>> 
>> [ 4]  %s %s: %d
>> 
>> 
>> 
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>> 
>> $ readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [10]  abc
>> 
>>  [14]  c
>> 
>>  [16]  %s %s: %d^J
>> 
>> 
>> 
>> clang is able to merge short string (“c”) into the tail of a long string
>> (“abc”), while gcc will not.
>> 
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
> 
> I don't think there is a way to disable it.
> 
> Why do you want to disable this behaviour?
> 
> - Hans
> 
> 
> --
> 
> Message: 2
> Date: Fri, 6 Jul 2018 08:22:57 +
> From: "Jian, Xu via cfe-users" 
> To: Hans Wennborg 
> Cc: "cfe-users@lists.llvm.org" 
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
>   <21f00ce5ca12aa41a34a1a361177f7f901761...@mx201cl03.corp.emc.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Hans,
> We need to compare whether ELF files of two builds are identical.
> Because of string merge, the comparison has some trouble.
> 
> For example in case following code lines (may be in different files):
> ---
> const char* s_array[1]="s";
> const char *first_s="this first bigger s";
> const char *second_s="this second bigger s";
> ---
> 
> After clang build ELF out, sometimes the s_array[1] contail the position of 
> the tail of first_s in .rodata second, while sometimes second_s.
> This lead to .data section diff since s_array is in it.
> The ELF diffs, while nothing changed from functionality point of view.
> 
> Thanks.
> 
> -Original Message-
> From: hwennb...@google.com [mailto:hwennb...@google.com] On Behalf Of Hans 
> Wennborg
> Sent: Friday, July 6, 2018 3:54 PM
> To: Jian, Xu
> Cc: cfe-users@lists.llvm.org
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> 
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users 
>  wrote:
>> Hi,
>> 
>> The following c source code abc.c:
>> 
>> #include 
>> 
>> int g_val=10;
>> 
>> const char *g_str="abc";
>> 
>> const char *g_str1="c";
>> 
>> int main(void)
>> 
>> {
>> 
>>printf("%s %s: %d\n",g_str,g_str1,g_val);
>> 
>>return 0;
>> 
>> }
>> 
>> 
>> 
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>> 
>> # readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [ 0]  abc
>> 
>> [ 4]  %s %s: %d
>> 
>> 
>> 
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>> 
>> $ readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [10]  abc
>> 
>>  [14]  c
>> 
>>  [16]  %s %s: %d^J
>> 
>> 
>> 
>> clang is able to merge short string (“c”) into the tail of a long 
>> string (“abc”), while gcc will not.
>> 
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
> 
> I don't think there is a way to disable it.
> 
> Why do you want to disable this behaviour?
> 
> - Hans
> 
> --
> 
> Message: 3
> Date: Fri, 6 Jul 2018 11:01:25 +0200
> From: Hans Wennborg via cfe-users 
> To: "Jian, Xu" 
> Cc: "cfe-users@lists.llvm.org" 
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
>   
> Content-Type: text/plain; charset="UTF-8"
> 
> On Fri, Jul 6, 2018 at 10:22 AM, Jian, Xu  wrote:
>> Hi Hans,
>> We need to compare whether ELF files of two builds are identical.
>> Because of string merge, the comparison has some trouble.
>> 
>> For example in case following code lines (may be in different files):
>> ---
>> const char* s_array[1]="s";
>> const char *first_s="this first bigger s";
>> const char *second_s="this second bigger s";
>> ---
>> 
>> After clang build ELF out, sometimes the s_array[1] contail the position of 
>> the tail of first_s in .rodata second, while sometimes second_s.
>> This lead to .data section diff since s_array is in it.
>> The ELF