Re: Porting libsanitizer to aarch64
[resending in plain text mode; arghh] On Wed, May 22, 2013 at 11:25 AM, Konstantin Serebryany wrote: > Hi Christophe, > > We would love to see the aarch64-specific changes in upstream repo > (see https://code.google.com/p/address-sanitizer/wiki/HowToBuild). > Once the changes are in the upstream svn, one of us will merge them to the > gcc trunk. > > If you have some non-trivial concerns, please reach us at > address-saniti...@googlegroups.com > My question to you would be: will it be possible to have a public build bot > for aarch64? > (W/o regular automated testing we will be breaking aarch64 every second day) > > > On Tue, May 21, 2013 at 7:44 PM, Jakub Jelinek wrote: >> >> On Tue, May 21, 2013 at 05:35:45PM +0200, Christophe Lyon wrote: >> > I have been looking at enabling libsanitizer for aarch64 GCC compilers. >> > >> > To make the build succeed, I had to modify libsanitizer code: >> > - some syscalls are not available on aarch64 (libsanitizer uses some >> > legacy ones such as open, readlink, stat, ...) >> > - unwinding code needs to be added. >> > >> > What's the way of discussing such patches? On GCC lists or elsewhere? >> >> libsanitizer/ changes for code imported from upstream repo needs to be >> discussed with the asan maintainers and done first in the upstream repo, >> then imported. >> >> > Then arises a runtime problem: aarch64's frame grows upward which is >> > not supported: how long would it take to develop this support if at >> > all possible? >> >> Better do what all other targets that want to support -fstack-protector* >> or -fsanitize=address, use frame grows downward if flag_stack_protector >> || flag_asan. You wouldn't have -fstack-protector* support otherwise >> either. > > > I would second that. > Supporting upward-growing stack will introduce quite a bit of disruption in > the run-time library and > in both compiler modules (GCC and LLVM). > > >> >> >> > I have not looked at tsan in detail yet, it currently does not build >> > for aarch64 either. >> >> tsan is right now x86_64 only, but that decision is pretty much the >> runtime >> library decision on what will be supported. > > > indeed, today tsan is x86_64-only. (it even has a small x86_64 assembly > blob). > this tool heavily depends on 64-bit atomic loads/stores, so we are unlikely > to ever implement > it on 32-bit architectures (ARM, i386, etc). > But on aarch64 (or, e.g. PowerPC) it should be doable. > For tsan-related discussions you are welcome to > thread-saniti...@googlegroups.com > > --kcc > > >> >> >> Jakub > >
Re: Porting libsanitizer to aarch64
On Wed, May 22, 2013 at 11:25:18AM +0400, Konstantin Serebryany wrote: > > > Then arises a runtime problem: aarch64's frame grows upward which is > > > not supported: how long would it take to develop this support if at > > > all possible? > > > > Better do what all other targets that want to support -fstack-protector* > > or -fsanitize=address, use frame grows downward if flag_stack_protector > > || flag_asan. You wouldn't have -fstack-protector* support otherwise > > either. > > > > I would second that. > Supporting upward-growing stack will introduce quite a bit of disruption in > the run-time library and > in both compiler modules (GCC and LLVM). Frame grows upward isn't necessarily stack grows upward, the former is just mostly compiler internal decision in which direction to allocate automatic variables and later on spill slots, the latter is an ABI matter and AFAIK only PA and stormy16 goes against all other architectures on that. That said, at least for stack protector frame grows downward is a must, because you don't want array overflows to clobber spill slots, thus spill slots must go below arrays which are laid out earlier. Changing frame grows upward into frame grows downward shouldn't be that hard, see e.g. rs6000 port, where #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 || flag_asan != 0) and grep the port where it uses FRAME_GROWS_DOWNWARD. Basically you need to tweak initial elimination offset computation for it, and that might be it, or perhaps one or two extra spots. Jakub
Re: RFC: S/390 Transactional memory support - save/restore of FPRs
On 21/05/13 16:28, Torvald Riegel wrote: > On Tue, 2013-05-21 at 14:40 +0200, Andreas Krebbel wrote: > You could also start with supporting s390 HTM through the transactional > language constructs we already support (__transaction_atomic etc.) and > libitm. The advantage would be that you can reuse quite a few bits of > existing machinery (e.g., different fallbacks when the HTM can't execute > a certain transaction, some analyses on the compilation side); however, > this doesn't give programmers as much control as if using the HTM > directly, and it requires a function call on begin and commit when using > the current libitm ABI. > > (I know that this is kind of a side note, because you seem to be looking > for a way to expose this at the granularity of HTM begin/commit builtins > (e.g., to base lock elision implementations on top of it); but I think > that in the long run txnal language constructs are easier for many > users.) The patch I have so far implements libitm HTM support for S/390 using the builtins. Very much like it is done on x86. > In libitm, it's probably easier to write custom assembly code for > ITM_beginTransaction that saves/restores all additional bits not > restored by the HTM explicitly through a partial SW setjmp. This > approach at least worked well for AMD's ASF, which didn't even restore > all normal registers. Ok. I'll have a look. I haven't done measurements with libitm so far. The experiments with the low-level builtins show that having a function call for starting and ending a transaction is a big hit already so I didn't invest much into optimizing the libitm variant for now. -Andreas-
Re: RFC: S/390 Transactional memory support - save/restore of FPRs
On 21/05/13 20:01, Richard Henderson wrote: > On 05/21/2013 05:40 AM, Andreas Krebbel wrote: >> Hi, >> >> I'm currently implementing support for hardware transactional memory >> in the S/390 backend and ran into a problem with saving and restoring >> the floating point registers. >> >> On S/390 the tbegin instruction starts a transaction. If a subsequent >> memory access collides with another the transaction is aborted. The >> execution then continues *after* the tbegin instruction. All memory >> writes after the tbegin are rolled back, the general purpose registers >> selected in the tbegin operand are restored, and the condition code is >> set in order indicate that an abort occurred. What the code then is >> supposed to do is to check the condition code and either jump back to >> the transaction if it is a temporary failure or provide an alternate >> implementation using e.g. a lock. >> >> Unfortunately our tbegin instruction does not save the floating point >> registers leaving it to the compiler to make sure the old values get >> restored. This will be necessary if the abort code relies on these >> values and the transaction body modifies them. >> >> With my current approach I try to place FPR clobbers to trigger GCC >> generating the right save/restore operations. This has some >> drawbacks: >> >> - Bundling the clobbers with the tbegin causes FPRs to be restored >> even in the good path (the transaction never aborts). > > This is the only path I'd recommend for trying to implement tbegin as an > intrinsic. Mmmh ok. Where do you think the other approach (clobber in abort code + abnormal edge) could break? I think it is important to make the good path as fast as possible in order to give transactions a chance. Restoring the FPRs even for transactions never dealing with FPRs is quite a penalty. On S/390 we do not have a load multiple for that :( > That said, enabling support for tbegin in libitm doesn't really require any > intrinsics. The simplest way is to just use inline assembly with the htm_* > inlines we define in the config file. I don't believe you'll need to worry > about the FPRs at all because we'll have performed a setjmp on the way in, and > a longjmp on the way out, that will restore them properly. Right. Since the wrapper functions already do a lot of reg save/restore having them in the builtin again would be redundant. -Andreas-
Re: RFC: S/390 Transactional memory support - save/restore of FPRs
On Wed, 2013-05-22 at 11:03 +0200, Andreas Krebbel wrote: > On 21/05/13 16:28, Torvald Riegel wrote: > > On Tue, 2013-05-21 at 14:40 +0200, Andreas Krebbel wrote: > > You could also start with supporting s390 HTM through the transactional > > language constructs we already support (__transaction_atomic etc.) and > > libitm. The advantage would be that you can reuse quite a few bits of > > existing machinery (e.g., different fallbacks when the HTM can't execute > > a certain transaction, some analyses on the compilation side); however, > > this doesn't give programmers as much control as if using the HTM > > directly, and it requires a function call on begin and commit when using > > the current libitm ABI. > > > > (I know that this is kind of a side note, because you seem to be looking > > for a way to expose this at the granularity of HTM begin/commit builtins > > (e.g., to base lock elision implementations on top of it); but I think > > that in the long run txnal language constructs are easier for many > > users.) > > The patch I have so far implements libitm HTM support for S/390 using the > builtins. Very much like > it is done on x86. Andi Kleen posted a patch a while ago with an optimization to the C code that moved the xbegin() etc. into the ITM_beginTransaction. It had some small issues, and didn't yet made it into trunk AFAIK. > > In libitm, it's probably easier to write custom assembly code for > > ITM_beginTransaction that saves/restores all additional bits not > > restored by the HTM explicitly through a partial SW setjmp. This > > approach at least worked well for AMD's ASF, which didn't even restore > > all normal registers. > > Ok. I'll have a look. I haven't done measurements with libitm so far. The > experiments with the > low-level builtins show that having a function call for starting and ending a > transaction is a big > hit already so I didn't invest much into optimizing the libitm variant for > now. For very small transactions, it can certainly be a problem; for example, if one wants to optimize custom concurrent code with transactions. OTOH, if the use case is lock elision in existing lock-based code, then one would probably have function calls anyway for the lock operations, and then one would compare performance against this case too.
Re: RFC: S/390 Transactional memory support - save/restore of FPRs
On 05/22/2013 02:23 AM, Andreas Krebbel wrote: > Mmmh ok. Where do you think the other approach (clobber in abort code + > abnormal edge) could break? It's mostly about the infrastructure of maintaining the edges. It's quite a lot of code to maintain normal EH + TM edges. We don't even bother maintaining edges for setjmp/longjmp. There's really no infrastructure set up for special backend abnormal edges. And since we hope that everyone will be using the language-level TM, it seems like a waste to implement those special backend edges. r~
Re: RFC: S/390 Transactional memory support - save/restore of FPRs
On Wed, May 22, 2013 at 09:52:35AM -0700, Richard Henderson wrote: > lot of code to maintain normal EH + TM edges. We don't even bother > maintaining > edges for setjmp/longjmp. Not any longer, 4.9 has AB edges to setjmp from longjmp or potential longjmp callers. Jakub
Re: RFC: S/390 Transactional memory support - save/restore of FPRs
> Not any longer, 4.9 has AB edges to setjmp from longjmp or potential longjmp > callers. And all 4.x (x >= 1) compilers have AB edges to (lowered) __builtin_setjmp from __builtin_longjmp or potential __builtin_longjmp callers. -- Eric Botcazou
Re: Porting libsanitizer to aarch64
On 05/22/2013 12:43 AM, Jakub Jelinek wrote: > Changing frame grows upward into frame grows downward shouldn't be that > hard, see e.g. rs6000 port, where > #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 || flag_asan != 0) > and grep the port where it uses FRAME_GROWS_DOWNWARD. > Basically you need to tweak initial elimination offset computation for it, > and that might be it, or perhaps one or two extra spots. FWIW, I would actually recommend against conditionalizing FRAME_GROWS_DOWNWARD for a new port. Just make it _always_ grow down and save yourself the additional code bloat in the backend. r~