RE: Clarification on newlib version for building AMDGCN offloading backend

2023-07-03 Thread Stubbs, Andrew via Gcc
Hi Wil,

Our toolchains installations are relocatable, and therefore can't have an rpath 
hardcoded into them. Instead, we provide instructions in the user manual how to 
set LD_LIBRARY_PATH to pick up the correct libraries.

If you're creating OS packages then you could probably add the lib64 directory 
to the ldcache so it Just Works without any explicit rpath.

If you really want to modify the specs, you can do this by patching the 
gcc/config/i386/linux-common.h sourcefile to add it to the LINK_SPEC there, or 
you can build and install the compiler, run "gcc -dumpspecs", and write the 
edited text to a file named "specs" in the correct install directory. You'll 
want it in the x86_64 compiler directory, not the amdgcn compiler though, as 
the dynamic libraries are the host-side implementation, not the GPU side.

There might be a configure option too, but I've never investigated that. If 
there isn't and you would like to add one, then I expect upstream GCC would be 
happy to accept a patch.

Andrew

> -Original Message-
> From: Wileam Yonatan Phan 
> Sent: Friday, June 30, 2023 10:06 PM
> To: Andrew Stubbs ; gcc@gcc.gnu.org
> Subject: Re: Clarification on newlib version for building AMDGCN offloading
> backend
>
> Hi Andrew,
>
> Just wanna follow up on the progress of this endeavor of enabling GCC with
> AMDGCN offloading in Spack. So far I think I've got everything working, except
> for the part where libgomp is pulled from the wrong place at runtime, because
> Spack prefers using RPATH to LD_LIBRARY_PATH. As outlined in the Spack PR
> comments, the proposed fix is modifying the *link_gomp field inside  prefix>/lib/gcc/amdgcn-amdhsa//specs to add the rpath flags for
> libgomp. But I'm honestly unsure if this should be done at configure time, 
> build
> time, or install time.
>
> The Spack PR can be accessed here:
>  om%2Fspack%2Fspack%2Fpull%2F35919&data=05%7C01%7Candrew.stubbs%4
> 0siemens.com%7C673557f0a6c04a52c1c608db79ade602%7C38ae3bcd95794f
> d4addab42e1495d55a%7C1%7C0%7C638237560023809014%7CUnknown%7C
> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
> XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UrKryjKfE6uH7tJR4jNbSfK4AfnTR
> %2FeJg3tiKgR8LeQ%3D&reserved=0>
>
> Please advise,
> Wil
>
> From: Andrew Stubbs 
> Sent: Thursday, March 30, 2023 04:45
> To: Wileam Yonatan Phan ; gcc@gcc.gnu.org
> 
> Subject: Re: Clarification on newlib version for building AMDGCN offloading
> backend
>
> On 29/03/2023 19:18, Wileam Yonatan Phan wrote:
> > Hi Andrew,
> >
> > I just built GCC 12.2.0 with AMDGCN offloading successfully with Spack!
> > However, when I tried to test it with an OpenACC test code that I have, I
> encountered the following error message:
> >
> > wyp@basecamp:~/work/testcodes/f90-acc-ddot$ gfortran -fopenacc
> > -foffload=amdgcn-unknown-amdhsa="-march=gfx900" ddot.f90
> > as: unrecognized option '-triple=amdgcn--amdhsa'
> > mkoffload: fatal error:
> > x86_64-pc-linux-gnu-accel-amdgcn-unknown-amdhsa-gcc returned 1 exit
> status compilation terminated.
> > lto-wrapper: fatal error:
> > /home/wyp/work/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-12.2.0/gcc-
> 12.2.0-w7lclfarefmge3uegn2a5vw37bnwhwto/libexec/gcc/x86_64-pc-linux-
> gnu/12.2.0//accel/amdgcn-unknown-amdhsa/mkoffload returned 1 exit status
> compilation terminated.
> > /usr/bin/ld: error: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
>
> My guess is that it's trying to use the wrong assembler. Usually this means
> there is a problem with your installation procedure and/or your PATH. I think
> you should be able to investigate further using -v and/or strace. The correct 
> one
> should be named $DESTDIR/usr/local/amdgcn-amdhsa/bin/as, but this will be
> different if you configured GCC with a custom --prefix location. If you have
> relocated the toolchain since installation then the toolchain will attempt to
> locate libraries and tools relative to the gcc binary. If it does not find 
> them there
> then it looks in the "usual places", and those usually contain an "as" 
> suitable
> only for the host system.
>
> If you find an error on the Wiki instructions please let me know and I will
> correct them.
>
> Andrew


Tiny asm

2023-07-03 Thread jacob navia
Dear Friends:

1) I have (of course) kept your copyright notice at the start of the « asm.h » 
header file of my project.

2) I have published my source code using your GPL V3 license

I am not trying to steal you anything. And I would insist that I have great 
respect for the people working with gcc. In no way I am trying to minimize 
their accomplishments. What happens is that layers of code produced by many 
developers have accumulated across the years, like the dust in the glass shelf 
of my grand mother back home. Sometimes in spring she would clean it. 

I am doing just that.

That said, now I have some questions:

1) What kind of options does gcc pass to its assembler? Is there in the huge 
source tree of gcc a place where those options are emitted?
  This would allow me to keep only those options into tiny-asm and erase all 
others (and the associated code)

2) I have to re-engineer the output of assembler instructions. Instead of 
writing to an assembler file (or to a memory assembler file) I will have to 
convince gcc to output into a buffer, and will pass the buffer address to the 
assembler. 

So, instead of outputting several MBs worth of assembler instructions, we would 
pass only 8 bytes of a buffer address. If the buffer is small (4K, for 
instance), it would pass into the CPU cache. Since the CPU cache is 16KB some 
of it may be kept there.

3) To do that, I need to know where in the back end source code you are writing 
to disk.

Thanks for your help, and thanks to the people that posted encouraging words.

jacob



Re: gcc tricore porting

2023-07-03 Thread Richard Earnshaw (lists) via Gcc

On 03/07/2023 15:34, Joel Sherrill wrote:

On Mon, Jul 3, 2023, 4:33 AM Claudio Eterno 
wrote:


Hi Joel, I'll give an answer ASAP on the newlib and libgloss...
I supposed your question were about the licences question on newlib,
instead you were really asking what changed on the repo libs...



It was a bit of both. If they put the right licenses on the newlib and
libgloss ports, you should be able to use them and eventually submit them.
But GCC, binutils, and gdb would be gpl and require an assignment to the
FSF. That is all I meant.


It's not quite as restricted as that.  For GCC, I suggest reading 
https://gcc.gnu.org/contribute.html#legal for more details.


I think there are similar processes in place for binutils as well.  (I'm 
not quite so sure for GDB).


R.



An option here is to reach out to the authors and ask if they are willing
to do the FSF assignment. If they are, then any GPL licensed code from them
might be a baseline.

It looks like their current products may be based on LLVM.

--joel


C.



Il giorno dom 2 lug 2023 alle ore 19:53 Claudio Eterno <
eterno.clau...@gmail.com> ha scritto:


Hi Joel, can you give me more info regarding newlib or libgloss cases?
Unfortunately I'm a newbie on th9is world...
Thank you,
Claudio

Il giorno dom 2 lug 2023 alle ore 17:38 Joel Sherrill 
ha scritto:




On Sun, Jul 2, 2023, 3:29 AM Claudio Eterno 
wrote:


Hi, Joel and Mikael
taking a look at the code it seems that the repo owner is higtech
 but we have no confirmations.
In fact, after a comparison with gcc 9.4.0 original files i see this on
a lot of ("WITH_HIGHTEC") [intl.c]:
[image: image.png]
Probably this version of gcc is a basic version of their tricore-gcc
and probably works fine but that repo doesn't show any extra info.
Seems also impossible to contact the owner (that account doesn't show
any email or other info)..
Honestly with these conditions, from gcc development point of view,
that repo has no value.



Without an assignment, you can't submit that code. That's a blocker on
using it if there isn't one.

But you can file an issue against the repo asking questions.


Anyway this is a good starting point...




Maybe not if you can't submit it. Anything that needs to be GOL licensed
and owned by the FSF is off limits.

But areas with permissive licenses might be ok if they stuck with those.
Look at what they did with newlib and libgloss.

--joel



C.



Il giorno lun 19 giu 2023 alle ore 18:55 Joel Sherrill 
ha scritto:




On Mon, Jun 19, 2023, 10:36 AM Mikael Pettersson via Gcc <
gcc@gcc.gnu.org> wrote:


(Note I'm reading the gcc mailing list via the Web archives, which
doesn't let me
create "proper" replies. Oh well.)

On Sun Jun 18 09:58:56 GMT 2023,  wrote:

Hi, this is my first time with open source development. I worked in
automotive for 22 years and we (generally) were using tricore

series for

these products. GCC doesn't compile on that platform. I left my

work some

days ago and so I'll have some spare time in the next few months. I

would

like to know how difficult it is to port the tricore platform on

gcc and if

during this process somebody can support me as tutor and... also if

the gcc

team is interested in this item...


https://github.com/volumit has a port of gcc + binutils + newlib +
gdb
to Tricore,
and it's not _that_ ancient. I have no idea where it originates from
or how complete
it is, but I do know the gcc-4.9.4 based one builds with some tweaks.




https://github.com/volumit/package_494 says there is a port in

process to gcc 9. Perhaps digging in and assessing that would be a good
start.



One question is whether that code has proper assignments on file for
ultimate inclusion. That should be part of your assessment.

--joel





I don't know anything more about it, I'm just a collector of

cross-compilers for
obscure / lost / forgotten / abandoned targets.

/Mikael





--
Claudio Eterno
via colle dell'Assietta 17
10036 Settimo Torinese (TO)





--
Claudio Eterno
via colle dell'Assietta 17
10036 Settimo Torinese (TO)




--
Claudio Eterno
via colle dell'Assietta 17
10036 Settimo Torinese (TO)







Re: wishlist: support for shorter pointers

2023-07-03 Thread David Brown via Gcc

On 28/06/2023 10:35, Rafał Pietrak via Gcc wrote:

Hi Jonathan,

W dniu 28.06.2023 o 09:31, Jonathan Wakely pisze:




If you use a C++ library type for your pointers the syntax above 
doesn't need to change, and the fancy pointer type can be implemented 
portable, with customisation for targets where you could use 16 bits 
for the pointers.


As you can expect from the problem I've stated - I don't know C++, so 
I'll need some more advice there.


But, before I dive into learning C++ (forgive the naive question) 
isn't it so, that C++ comes with a heavy runtime? One that will bloat my 
tiny project? Or the bloat comes only when one uses particular 
elaborated class/inheritance scenarios, and this particular case ( for 
(...; ...; x = x->next) {} ) will not draw any of that into this project?





Let me make a few points (in no particular order) :

1. For some RISC targets, such as PowerPC, it is common to have a 
section of memory called the "small data section".  One of the registers 
is dedicated as an anchor to this section, and data within it is 
addressed as Rx + 16-bit offset.  But this is primarily for data at 
fixed (statically allocated) addresses, since reads and writes using 
this address mode are smaller and faster than full 32-bit addresses. 
Normal pointers are still 32-bit.  It also requires a dedicated register 
- not a big cost when you have 31 GPRs, but much more costly when you 
have only 13.


2. C++ is only costly if you use costly features.  On small embedded 
systems, you want "-fno-exceptions -fno-rtti", and you will get as good 
(or bad!) results for C++ as for C.  Many standard library features 
will, however, result in a great deal of code - it is usually fairly 
obvious which classes and functions are appropriate.


3. In C, you could make a type such as :

typedef struct {
uint16_t p;
} small_pointer_t;

and conversion functions :

static const uintptr_t ram_base = 0x2000;

static inline void * sp_to_voidp(small_pointer_t sp) {
return (void *)(ram_base + sp);
}

static inline small_pointer_t voidp_to_sp(void * p) {
small_pointer_t sp;
sp.p = (uintptr_t) p - ram_base;
return sp;
}

Then you would use these access functions to turn your "small pointers" 
into normal pointers.  The source code would become significantly harder 
to read and write, and less type-safe, but could be quite efficient.


In C++, you'd use the same kinds of functions.  But they would now be 
methods in a class template, and tied to overloaded operators and/or 
conversion functions.  The result would be type-safe and let you 
continue to use a normal pointer-like syntax, and with equally efficient 
generated code.  You could also equally conveniently have small pointers 
to ram and to peripheral groups.  This mailing list is not really the 
place to work through an implementation of such class templates - but it 
certainly could be done.



4. It is worth taking a step back, and thinking about how you would like 
to use these pointers.  It is likely that you would be better thinking 
in terms of an array, rather than pointers - after all, you don't want 
to be using dynamically allocated memory here if you can avoid it, and 
certainly not generic malloc().  If you can use an array, then your 
index type can be as small as you like - maybe uint8_t is enough.



David





Re: wishlist: support for shorter pointers

2023-07-03 Thread Ian Lance Taylor via Gcc
On Wed, Jun 28, 2023 at 11:21 PM Rafał Pietrak via Gcc  wrote:
>
> W dniu 28.06.2023 o 17:44, Richard Earnshaw (lists) pisze:
> [---]
> > I think I understand what you're asking for but:
> > 1) You'd need a new ABI specification to handle this, probably involving
> > register assignments (for the 'segment' addresses), the initialization
> > of those at startup, assembler and linker extensions to allow for
> > relocations describing the symbols, etc.
>
> I was thinking about that, and it doesn't look as requiring that deep
> rewrites. ABI spec, that  could accomodate the functionality could be as
> little as one additional attribute to linker segments.

If I understand correctly, you are looking for something like the x32
mode that was available for a while on x86_64 processors:
https://en.wikipedia.org/wiki/X32_ABI .  That was a substantial amount
of work including changes to the compiler, assembler, linker, standard
library, and kernel.  And at least to me it's never seemed
particularly popular.

Ian


Re: wishlist: support for shorter pointers

2023-07-03 Thread Rafał Pietrak via Gcc

Hi David,

W dniu 3.07.2023 o 16:52, David Brown pisze:
[]


But, before I dive into learning C++ (forgive the naive question) 
isn't it so, that C++ comes with a heavy runtime? One that will bloat 
my tiny project? Or the bloat comes only when one uses particular 
elaborated class/inheritance scenarios, and this particular case ( for 
(...; ...; x = x->next) {} ) will not draw any of that into this project?





Let me make a few points (in no particular order) :

1. For some RISC targets, such as PowerPC, it is common to have a 
section of memory called the "small data section".  One of the registers 
is dedicated as an anchor to this section, and data within it is 
addressed as Rx + 16-bit offset.  But this is primarily for data at 
fixed (statically allocated) addresses, since reads and writes using 
this address mode are smaller and faster than full 32-bit addresses. 
Normal pointers are still 32-bit.  It also requires a dedicated register 
- not a big cost when you have 31 GPRs, but much more costly when you 
have only 13.


I don't have any experience with PowerPC, all you say here is new to me. 
And PPC architecture today is "kind of exotic", but I appreciate the 
info and I may look it up for insight how "short pointers" influence 
performance. Thenx.


2. C++ is only costly if you use costly features.  On small embedded 
systems, you want "-fno-exceptions -fno-rtti", and you will get as good 
(or bad!) results for C++ as for C.  Many standard library features 
will, however, result in a great deal of code - it is usually fairly 
obvious which classes and functions are appropriate.


OK. I become aware, that I will no longer be able to turn a blind eye on 
C++. :(




3. In C, you could make a type such as :

 typedef struct {
     uint16_t p;
 } small_pointer_t;

and conversion functions :

 static const uintptr_t ram_base = 0x2000;

 static inline void * sp_to_voidp(small_pointer_t sp) {
     return (void *)(ram_base + sp);
 }

 static inline small_pointer_t voidp_to_sp(void * p) {
     small_pointer_t sp;
     sp.p = (uintptr_t) p - ram_base;
     return sp;
 }

Then you would use these access functions to turn your "small pointers" 
into normal pointers.  The source code would become significantly harder 
to read and write, and less type-safe, but could be quite efficient.


That actually is a problem. I really can make a lot of the code in 
question into an assembler, and have it behave precisely as I desire, 
but that'll make the project not portable - that's why I though of 
casting the use case onto this list here. This way (I hoped) it may 
inspire "the world" and have it supported at compiler level some time in 
the future. Should it not be the case, I'd rather stay with "plain C" 
and keep the code portable and readable (rather then obfuscate it  ... 
even by merely too "talkative sources").


[]
to ram and to peripheral groups.  This mailing list is not really the 
place to work through an implementation of such class templates - but it 
certainly could be done.


OK. I fully agree.

FYI: it was never my intention to inquire for advice of how to cook such 
"short/funny" pointers by special constructs / technic in c-programming. 
Actually I was a little set back reading such advice as first responses 
to my email. It was nice, but surprising.


I hoped to get a discussion more towards "how to let compiler know", 
that a particular segment/section of a program-data will be emitted into 
an executable in a "constraint output section", so that compiler could 
"automagicly" know, that using "short" pointers for that data would 
suffice, and in consequence would generate such instructions without 
any change to the source code.


It's sort of obvious, that this would also require support from libc 
(like a specific "malloc()" and friends), but application sources could 
stay untouched, and that's IMHO key point here.


4. It is worth taking a step back, and thinking about how you would like 
to use these pointers.  It is likely that you would be better thinking 
in terms of an array, rather than pointers - after all, you don't want 
to be using dynamically allocated memory here if you can avoid it, and 
certainly not generic malloc().  If you can use an array, then your 
index type can be as small as you like - maybe uint8_t is enough.


I did that trip ... some time ago. May be I discarded the idea 
prematurely, but I dropped it because I was afraid of cost of 
multiplication (index calculation) in micros. That my "assumption" may 
actually not be true, since today even the mini-minis often have integer 
multiplication units, so my reasoning became false.


But. Even if I turn pointers into indices for tiny micros ... that'd 
make the code not portable. I'm not to eager to do that.


Still, thank you very much for sharing those concepts.

With best regards,

-R


Re: wishlist: support for shorter pointers

2023-07-03 Thread Rafał Pietrak via Gcc

Hi Ian,

W dniu 3.07.2023 o 17:07, Ian Lance Taylor pisze:

On Wed, Jun 28, 2023 at 11:21 PM Rafał Pietrak via Gcc  wrote:

[]

I was thinking about that, and it doesn't look as requiring that deep
rewrites. ABI spec, that  could accomodate the functionality could be as
little as one additional attribute to linker segments.


If I understand correctly, you are looking for something like the x32
mode that was available for a while on x86_64 processors:
https://en.wikipedia.org/wiki/X32_ABI .  That was a substantial amount
of work including changes to the compiler, assembler, linker, standard
library, and kernel.  And at least to me it's never seemed
particularly popular.


Yes.

And WiKi reporting up to 40% performance improvements in some corner 
cases is impressive and encouraging. I believe, that the reported 
average of 5-8% improvement would be significantly better within MCU 
tiny resources environment. In MCU world, such improvement could mean 
fit-nofit of a project into a particular device.


-R


Re: wishlist: support for shorter pointers

2023-07-03 Thread Richard Earnshaw (lists) via Gcc

On 03/07/2023 17:42, Rafał Pietrak via Gcc wrote:

Hi Ian,

W dniu 3.07.2023 o 17:07, Ian Lance Taylor pisze:
On Wed, Jun 28, 2023 at 11:21 PM Rafał Pietrak via Gcc 
 wrote:

[]

I was thinking about that, and it doesn't look as requiring that deep
rewrites. ABI spec, that  could accomodate the functionality could be as
little as one additional attribute to linker segments.


If I understand correctly, you are looking for something like the x32
mode that was available for a while on x86_64 processors:
https://en.wikipedia.org/wiki/X32_ABI .  That was a substantial amount
of work including changes to the compiler, assembler, linker, standard
library, and kernel.  And at least to me it's never seemed
particularly popular.


Yes.

And WiKi reporting up to 40% performance improvements in some corner 
cases is impressive and encouraging. I believe, that the reported 
average of 5-8% improvement would be significantly better within MCU 
tiny resources environment. In MCU world, such improvement could mean 
fit-nofit of a project into a particular device.


-R


I think you need to be very careful when reading benchmarketing (sic) 
numbers like this.  Firstly, this is a 32-bit vs 64-bit measurement; 
secondly, the benchmark (spec 2000) is very old now and IIRC was not 
fully optimized for 64-bit processors (it predates the 64-bit version of 
the x86 instruction set); thirdly, there are benchmarks in SPEC which 
are very sensitive to cache size and the 32-bit ABI just happened to 
allow them to fit enough data in the caches to make the numbers leap.


R.


Re: wishlist: support for shorter pointers

2023-07-03 Thread Rafał Pietrak via Gcc




W dniu 3.07.2023 o 18:57, Richard Earnshaw (lists) pisze:

On 03/07/2023 17:42, Rafał Pietrak via Gcc wrote:

Hi Ian,

[-]
And WiKi reporting up to 40% performance improvements in some corner 
cases is impressive and encouraging. I believe, that the reported 
average of 5-8% improvement would be significantly better within MCU 
tiny resources environment. In MCU world, such improvement could mean 
fit-nofit of a project into a particular device.


-R


I think you need to be very careful when reading benchmarketing (sic) 
numbers like this.  Firstly, this is a 32-bit vs 64-bit measurement; 
secondly, the benchmark (spec 2000) is very old now and IIRC was not 
fully optimized for 64-bit processors (it predates the 64-bit version of 
the x86 instruction set); thirdly, there are benchmarks in SPEC which 
are very sensitive to cache size and the 32-bit ABI just happened to 
allow them to fit enough data in the caches to make the numbers leap.


Yes. Sure. I am. I thought I've expressed it clearly, that the 
"fantastic 40%" I regard as just "corner case" - those don't usually 
reflect ordinary usage.


I was only highlighting the fact, that mare 5-8% improvement can result 
on fit-nofit of a particular design into a particular device ... in 
consequence requiring to use 4k-RAM device instead of 2k-RAM one.


Tiny improvements of performance of x64 workhorses can become relatively 
huge in micros like stm32. That's all.


-R


[RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-03 Thread Olivier Dion via Gcc
Hi all,

This is a request for comments on extending the atomic builtins API to
help avoiding redundant memory barriers.  Indeed, there are
discrepancies between the Linux kernel consistency memory model (LKMM)
and the C11/C++11 memory consistency model [0].  For example,
fully-ordered atomic operations like xchg and cmpxchg success in LKMM
have implicit memory barriers before/after the operations [1-2], while
atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11
do not have any ordering guarantees of an atomic thread fence
__ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3].

For a little bit of context here, we are porting liburcu [4] to atomic
builtins.  Before that, liburcu was using its own implementation for
atomic operations and its CMM memory consistency model was mimicking the
LKMM.  liburcu is now extending its CMM memory consistency model to
become close to the C11/C++11 memory consistency model, with the
exception of the extra SEQ_CST_FENCE memory order that is similar to
SEQ_CST, but ensure that a thread fence is emitted.  This is necessary
for backward compatibility of the liburcu uatomic API, but also for
closing the gap between the LKMM and the C11/C+11 memory consistency
model.  For example, to make Read-Modify-Write (RMW) operations match
the Linux kernel "full barrier before/after" semantics, the liburcu's
uatomic API has to emit both a SEQ_CST RMW operation and a subsequent
thread fence SEQ_CST, which leads to duplicated barriers in some cases.

Consider for example the following Dekker and the resulting assemblers
generated:

  int x = 0;
  int y = 0;
  int r0, r1;

  int dummy;

  void t0(void)
  {
  __atomic_store_n(&x, 1, __ATOMIC_RELAXED);

  __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
  __atomic_thread_fence(__ATOMIC_SEQ_CST);

  r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
  }

  void t1(void)
  {
  __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
  __atomic_thread_fence(__ATOMIC_SEQ_CST);
  r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
  }

  // BUG_ON(r0 == 0 && r1 == 0)

On x86-64 (gcc 13.1 -O2) we get:

  t0():
  movl$1, x(%rip)
  movl$1, %eax
  xchgl   dummy(%rip), %eax
  lock orq $0, (%rsp)   ;; Redundant with previous exchange.
  movly(%rip), %eax
  movl%eax, r0(%rip)
  ret
  t1():
  movl$1, y(%rip)
  lock orq $0, (%rsp)
  movlx(%rip), %eax
  movl%eax, r1(%rip)
  ret

On x86-64 (clang 16 -O2) we get:

  t0():
  movl$1, x(%rip)
  movl$1, %eax
  xchgl   %eax, dummy(%rip)
  mfence;; Redundant with previous exchange.
  movly(%rip), %eax
  movl%eax, r0(%rip)
  retq
  t1():
  movl$1, y(%rip)
  mfence
  movlx(%rip), %eax
  movl%eax, r1(%rip)
  retq

On armv8-a (gcc 13.1 -O2) we get:

  t0():
  adrpx0, .LANCHOR0
  mov w1, 1
  add x0, x0, :lo12:.LANCHOR0
  str w1, [x0]
  add x1, x0, 4
  mov w2, 1
  .L3:
  ldaxr   w3, [x1]
  stlxr   w4, w2, [x1]
  cbnzw4, .L3
  dmb ish   ;; Okay!
  add x1, x0, 8
  ldr w1, [x1]
  str w1, [x0, 12]
  ret
  t1():
  adrpx0, .LANCHOR0
  add x0, x0, :lo12:.LANCHOR0
  add x1, x0, 8
  mov w2, 1
  str w2, [x1]
  dmb ish
  ldr w1, [x0]
  str w1, [x0, 16]
  ret

On armv8.1-a (gcc 13.1 -O2) we get:

  t0():
  adrpx0, .LANCHOR0
  mov w1, 1
  add x0, x0, :lo12:.LANCHOR0
  str w1, [x0]
  add x2, x0, 4
  swpal   w1, w1, [x2]
  dmb ish   ;; Okay!
  add x1, x0, 8
  ldr w1, [x1]
  str w1, [x0, 12]
  ret
  t1():
  adrpx0, .LANCHOR0
  add x0, x0, :lo12:.LANCHOR0
  add x1, x0, 8
  mov w2,p 1
  str w2, [x1]
  dmb ish
  ldr w1, [x0]
  str w1, [x0, 16]
  ret

For the initial transition to the atomic builtins in liburcu, we plan on
emitting memory barriers to ensure correctness at the expense of
performance.  However, new primitives in the atomic builtins API would
help avoiding the redundant thread fences.

Indeed, eliminating redundant memory fences is often done in the Linux
kernel.  For example in kernel/sched/core.c:try_to_wake_up():

  /*
   * smp_mb__after_spinlock() provides the equivalent of a full memory
   * barrier between program-order earlier lock acquisitions and
   * program-order later memory accesses.
   * ...
   * Since most load-store architectures implement ACQUIRE with an
   * smp_mb() after the

Expert Engagement

2023-07-03 Thread Richard Nardi via Gcc


Hello,
I hope you are having a wonderful day. I would like to engage your firm to 
prepare my tax return for the current tax year. Prior to this year, my wife had 
always been in charge of our tax returns. However, our financial situation has 
changed, and she has also taken on additional responsibilities at work. 
Currently, I have concluded that having a professional prepare my tax return 
would be the most beneficial option for me. I can say for sure that we are 
fairly organized with our tax documentation. Besides our employment income, we 
also earn income from rental properties (Airbnb), stock options, dividends, and 
interest. While I understand that it is a busy time of year for tax 
professionals, I would appreciate your consideration of my request. I can send 
you my most recent tax documents and we can jump on a call with your quote, 
reasonable I hope, and any further questions you might have.

Looking forward to your response. Happy 4th of July!!

Richard Nardi

Senior Managing Director, Investments

NNN Properties, LLC
275 Madison Avenue, 13th Floor
New York, NY 10016

Website: 

www.nnnpro.com/our-team/ http://www.nnnpro.com/our-team/

Office: (332) 345-3212

License: 10401296108


Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-03 Thread Alan Stern
On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
> Hi all,
> 
> This is a request for comments on extending the atomic builtins API to
> help avoiding redundant memory barriers.  Indeed, there are

What atomic builtins API are you talking about?  The kernel's?  That's 
what it sounded like when I first read this sentence -- why else post 
your message on a kernel mailing list?

> discrepancies between the Linux kernel consistency memory model (LKMM)
> and the C11/C++11 memory consistency model [0].  For example,

Indeed.  The kernel's usage of C differs from the standard in several 
respects, and there's no particular reason for its memory model to match 
the standard's.

> fully-ordered atomic operations like xchg and cmpxchg success in LKMM
> have implicit memory barriers before/after the operations [1-2], while
> atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11
> do not have any ordering guarantees of an atomic thread fence
> __ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3].

After reading what you wrote below, I realized that the API you're 
thinking of modifying is the one used by liburcu for user programs.  
It's a shame you didn't mention this in either the subject line or the 
first few paragraphs of the email; that would have made understanding 
the message a little easier.

In any case, your proposal seems reasonable to me at first glance, with 
two possible exceptions:

1.  I can see why you have special fences for before/after load, 
store, and rmw operations.  But why clear?  In what way is 
clearing an atomic variable different from storing a 0 in it?

2.  You don't have a special fence for use after initializing an 
atomic.  This operation can be treated specially, because at the 
point where an atomic is initialized, it generally has not yet 
been made visible to any other threads.  Therefore the fence 
which would normally appear after a store (or clear) generally 
need not appear after an initialization, and you might want to 
add a special API to force the generation of such a fence.

Alan Stern


Re: Expert Engagement

2023-07-03 Thread Dave Blanchard
On Mon, 3 Jul 2023 19:24:23 +
Richard Nardi via Gcc  wrote:

> 
> Hello,
> I hope you are having a wonderful day. I would like to engage your firm to 
> prepare my tax return for the current tax year. [...]

Sounds legit. I have a feeling you'll be asking me for my credit card number 
and bank account information also at some point, plus my home mailing address 
so you can send me all your documents, so would it be helpful if I went ahead 
and emailed those to you right now? 

Thanks in advance for this opportunity to help you with your financial 
troubles. Can't wait to get started!

Sincerely,
Dave
Professional Tax Preparer and All-Around Generous Guy
www.intuit.com
phone number: 1-800-4INTUIT