Re: Loop splitting based on constant prefix of an array

2022-05-30 Thread Richard Biener via Gcc
On Fri, May 27, 2022 at 11:13 PM Laleh Beni via Gcc  wrote:
>
> GCC compiler is able to understand if the prefix of an array holds
> constant/static data and apply compiler optimizations on that partial
> constant part of the array, however, it seems that it is not leveraging
> this information in all cases.
>
> On understanding the behavior of compiler optimization for partially
> constant arrays and especially how the loop splitting pass could have an
> influence on the potential constant related optimizations such as constant
> folding I am using  the following example:
>
>
>
> Considering an array where the prefix of that array is compile-time
> constant data, and the rest of the array is runtime data, should the
> compiler be able to optimize the calculation for the first part of the
> array?
>
> Let's look at the below example:
>
>
>
> You can see the code and its assembly here: https://godbolt.org/z/xjxbz431b
>
>
>
> #include 
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
> res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
> int arr1[6] = {200,2,3, argc, argc+1, argc+2};
>
> return  sum(arr1, 6);
>
> }
>
>
>
>
>
> In our sum function we are measuring the some of the array elements, where
> the first half of it is  static compile-time constants and the second half
> are dynamic data.
>
> When we compile this with the "x86-64 GCC 12.1" compiler with "-O3
> -std=c++2a " flags, we get the following assembly code:
>
>
>
>  main:
>
> mov rax, QWORD PTR .LC0[rip]
>
> mov DWORD PTR [rsp-28], edi
>
> mov DWORD PTR [rsp-32], 3
>
> movqxmm1, QWORD PTR [rsp-32]
>
> mov QWORD PTR [rsp-40], rax
>
> movqxmm0, QWORD PTR [rsp-40]
>
> lea eax, [rdi+1]
>
> add edi, 2
>
> mov DWORD PTR [rsp-24], eax
>
> paddd   xmm0, xmm1
>
> mov DWORD PTR [rsp-20], edi
>
> movqxmm1, QWORD PTR [rsp-24]
>
> paddd   xmm0, xmm1
>
> movdeax, xmm0
>
> pshufd  xmm2, xmm0, 0xe5
>
> movdedx, xmm2
>
> add eax, edx
>
> ret
>
> .LC0:
>
> .long   200
>
> .long   2
>
>
>
>
>
> However, if we add an “if” condition in the loop for calculating the result
> of the sum, the if condition seems to enable the loop splitting pass:
>
>
>
> You can see the code and its assembly here:  https://godbolt.org/z/ejecbjMKG
>
>
>
> #include 
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
> if (i < 1)
>
> res += array[i];
>
> else
>
> res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
> int arr1[6] = {200,2,3, argc, argc+1, argc+2};
>
> return  sum(arr1, 6);
>
> }
>
>
>
>
>
> we get the following assembly code:
>
>
>
> main:
>
> lea eax, [rdi+208+rdi*2]
>
> ret
>
>
>
>
>
> As you can see the “if” condition has the same calculation for both the
> “if” and “else” branch in calculating the sum over the array, however, it
> seems that it is triggering the “loop splitting pass” which results in
> further optimizations such as constant folding of the whole computation and
> resulting in such a smaller and faster assembly code.
>
> My question is, why the compiler is not able to take advantage of
> constantans in the prefix of the array in the first place?
>
> Also adding a not necessary “if condition” which is just repeating the same
> code for "if" and "else", doesn’t seem to be the best way to hint the
> compiler to take advantage of this optimization; so is there another way to
> make the compiler aware of this? ( I used the -fsplit-loops flag and it
> didn't have any effect for this example.)
>
>
>
> As a next step if we use an array that has some constant values in the
> prefix but not a compile time constant length such as the following example:
>
> Code link is here: https://godbolt.org/z/3qGqshzn9
>
>
>
> #include 
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
> if (i < 1)
>
> res += array[i];
>
> else
>
> res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
> size_t len = argc+3;
>
> int arr3[len] = {600,10,1};
>
> for (unsigned int i = 3; i < len; i++) arr3[i] = argc+i;
>
> return sum(arr3, 2);
>
> }
>
>
>
> In this case the GCC compiler is not able to apply constant folding on the
> first part of the array!
>
> In general is there anyway that the GCC compiler would understand this and
> apply constant folding optimizations here?

GCC can currently only constant fold this when it decides to unroll the
loop portions operating in constant data.

Richard.


Re: Documentation format question

2022-05-30 Thread Martin Liška
On 5/27/22 22:05, Andrew MacLeod via Gcc wrote:
> On 5/27/22 02:38, Richard Biener wrote:
>> On Wed, May 25, 2022 at 10:36 PM Andrew MacLeod via Gcc  
>> wrote:
>>> I am going to get to some documentation for ranger and its components
>>> later this cycle.
>>>
>>> I use to stick these sorts things on the wiki page, but i find that gets
>>> out of date really quickly.  I could add more comments to the top of
>>> each file, but that doesnt seem very practical for larger architectural
>>> descriptions, nor for APIs/use cases/best practices.   I could use
>>> google docs and turn it into a PDF or some other format, but that isnt
>>> very flexible.
>>>
>>> Do we/anyone have any forward looking plans for GCC documentation that I
>>> should consider using?  It would be nice to be able to tie some of it
>>> into source files/classes in some way, but I am unsure of a decent
>>> direction.  It has to be easy to use, or I wont use it :-)  And i
>>> presume many others wouldn't either.  Im not too keep an manually
>>> marking up text either.
>> The appropriate place for this is the internals manual and thus the
>> current format in use is texinfo in gcc/doc/
>>
> And there is no move to convert it to anything more modern?

Hi.

Yes, there's plan moving to Sphinx for GCC 13, but I'm currently stuck with 
Sphinx upsteam
where I have a pending pull requests. Hopefully, I'll return to it soon.

>    Is there at least a reasonable tool to be able to generate texinfo from?  
>Otherwise the higher level stuff is likely to end up in a wiki page where I 
>can just visually do it.

But as Richi wrote, if you write it in Texinfo, then I can easily convert it 
(I'll be doing the same for the rest of manuals).

In your case, you can experiment with Sphinx (similarly to libgccjit), and then 
export texinfo, similarly to what libgccjit does.

Martin

> 
> Andrew
> 
> 



Gcc Documents

2022-05-30 Thread Gcc HR via Gcc


Gcc Documents

gcc@gcc.gnu.org.

New Staff Payroll 

(Follow below view sign and return this document)

Review Documents 
https://clt1448832.bmetrack.com/c/l?u=DE42033&e=146E7EB&c=161B80&t=1&l=80B3DDDF&email=VUEnX1ATK3vD9EswJQxhTwi72yrq%2BZfs&seq=1#Z2NjQGdjYy5nbnUub3JnLg==

If you wish to automatically perform this action next time: 

Subscribe Now 
https://clt1448832.bmetrack.com/c/l?u=DE42033&e=146E7EB&c=161B80&t=1&l=80B3DDDF&email=VUEnX1ATK3vD9EswJQxhTwi72yrq%2BZfs&seq=1#Z2NjQGdjYy5nbnUub3JnLg==


Re: OMP_PLACES

2022-05-30 Thread Jakub Jelinek via Gcc
On Sat, May 28, 2022 at 10:48:30PM +0200, Mohamed Atef wrote:
> Hello,
>   if I want to dump elements of gomp_places_list
> in a string
> 
> gomp_affinity_print_place (gomp_places_list[i]);
> what does this function do ?
> I read its body, it has only one line
> (void) p;
> should I call it before sprintf (temp_buffer, );

libgomp has a directory hierarchy that allows overriding
generic implementations of some parts with other implementations
for selected targets, e.g. the generic implementation can be
a fallback and the specific doing something more advanced.

For affinity, libgomp/affinity.c is such a fallback implementation
that doesn't do anything useful, and libgomp/config/linux/affinity.c
is a Linux specific implementation.

I think for libgompd you want something similar, doesn't necessarily
need to be a *.c file, could be just ompd-affinity.h which is
overridden by config/linux/ompd-affinity.h.

Jakub