Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-03 Thread Michael V. Zolotukhin
> The patch miscompiles the MPFR library on x86 Pentium Pro. Reduced testcase > attached, compile for x86 with -mtune=pentiumpro. Thanks, I'll look at that in the nearest future. Best regards, Michael

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
Hi HJ, You were right, and the change from epilogue_size_needed to size_needed was the rootcause of the bug. Here is the obvious fix for that, regtested and bootstrapped on i386/x86_64. Is it OK for trunk? Changelog: 2013-09-09 Michael Zolotukhin * config/i386/i386.c (ix86_expand_movm

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
> OK, > Would you mind adding a testcase? Thanks, here is the patch with Eric's test. OK to commit? Changelog: gcc: 2013-09-09 Michael Zolotukhin * config/i386/i386.c (ix86_expand_movmem): Fix epilogue generation. gcc/testsuite: 2013-09-09 Michael Zolotukhin * gcc.target/i

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
> I don't see anything i386 specific on the testcase, except the flags, > and don't see why you need -fno-common in there, there are no global vars. > So, I think it would be better to stick it into gcc.dg/torture/, drop > dg-require-* line and instead just add > /* { dg-additional-options "-march=

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
> I think it is worthwhile, various targets have many different ways to expand > memcpy, admittedly i?86/x86_64 probably the biggest number of these, and > while right now you've encountered it on ia32 with certain options doesn't > mean that in a few years it couldn't hit some unrelated target, ar

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
> Please don't introduce new *.x files, for tests where you need something > like that just stick it into gcc.dg/torture/ instead and use normal dg stuff > in there. Thanks, fixed. Ok to commit? Changelog: gcc: 2013-09-09 Michael Zolotukhin * config/i386/i386.c (ix86_expand_movmem): F

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
> No -march in runtime tests, please. Is mtune ok here? Michael > Uros.

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-09 Thread Michael V. Zolotukhin
> > Is mtune ok here? > Yes. Thanks, fixed. Ok to commit? I verified that the test fails without fix in i386.c and passes with it. Changelog: gcc: 2013-09-09 Michael Zolotukhin * config/i386/i386.c (ix86_expand_movmem): Fix epilogue generation. gcc/testsuite: 2013-09-09 Michael Zolo

Re: [RFC] Offloading Support in libgomp

2013-09-15 Thread Michael V. Zolotukhin
Hi Jakub, This patch looks ok for me in general, but I am a bit worried about using splay-trees. Couldn't we end up with their worst case linear performance instead desired log? Imagine the following scenario: #pragma parallel ... // to produce N-threads { # pragma target map (i1, i2, ...i

Re: [RFC] Offloading Support in libgomp

2013-09-15 Thread Michael V. Zolotukhin
> Libgomp will start N-1 new threads, and all of them would want to look up > mappings for i1,i2,...iK in the splay tree. The first one wouldn't find > anything and would map and insert all the values to the tree. But the > following > ones would look-up these addresses in the exactly same order

Re: [RFC] Offloading Support in libgomp

2013-09-16 Thread Michael V. Zolotukhin
> Yes, splay tree can get totally unbalanced and you can have a linear lookup > time, the O(log n) lookup time is amortized. But, if you e.g. really do > lookup sorted keys (which is not given, the compiler puts vars into the > clauses based on the user order or in the order references to those va

Re: [RFC] Offloading Support in libgomp

2013-09-16 Thread Michael V. Zolotukhin
> No. If you insert 1 to 100 into a splay tree in ascending order (that will > give you a totally unbalanced tree), and then lookup 1 to 100 in the > ascending order again, then the lookup of 1 will be expensive (100 > comparisons), but then for each following lookup it > will cost just 2 comparis

[PATCH][gomp4] Plugins Support in LibGOMP (Take 2)

2013-09-18 Thread Michael V. Zolotukhin
Hi Jakub, I merged my patch with recent changes in gomp4-branch, and the new version is below. Also, I fixed most of your remarks - the one isn't fixed is checking sizeof(void*)==sizeof(uintptr_t) in configure. I'll do it in the next patch. Is it ok for gomp4-branch? Also, I was thinking of ho

Re: [gomp4] Tweak GOMP_target{,_data,_update} arguments

2013-09-18 Thread Michael V. Zolotukhin
> As discussed earlier, I'd like to pass __OPENMP_TARGET__ argument to > all of GOMP_target{,_data,_update}, so that all those functions > can get at the offloading data section in the shared library or binary > making the call, so that the first time they encounter such a call > in the shared libr

Re: [gomp4] Tweak GOMP_target{,_data,_update} arguments

2013-09-19 Thread Michael V. Zolotukhin
Hi Jakub, Thanks for the explanation, it's getting a bit clearer, though I still have some questions. > __OPENMP_TARGET__ would be a linker plugin inserted symbol at the start of > some linker plugin created data section, which would start with some header > and then data. > Say > uleb128 number_

Re: [PATCH][gomp4] Plugins Support in LibGOMP (Take 2)

2013-09-19 Thread Michael V. Zolotukhin
Hi Jakub, Updated patch and my answers are below. > The OpenMP standard has the omp_is_initial_device () function that can be > used to query whether the code is offloaded or not. So I don't think we > need to do the logging. For the device 257 hack we of course don't return > that as true, but

Re: Add value range support into memcpy/memset expansion

2013-09-29 Thread Michael V. Zolotukhin
Hi Jan, > I also think the misaligned move trick can/should be performed by > move_by_pieces and we ought to consider sane use of SSE - current vector_loop > with unrolling factor of 4 seems bit extreme. At least buldozer is happy with > 2 and I would expect SSE moves to be especially useful for

[PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

2013-09-30 Thread Michael V. Zolotukhin
Hi Jan, Here is a patch we've talked about recently - it merges expanders of memset and memmov. As a natural side effect, this enables vector_loop in memset expanding as well. Though in some places merging movmem and setmem isn't so efficient (the original code in these versions differed a lot),

Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

2013-10-20 Thread Michael V. Zolotukhin
Hi Jan, Thanks for the review, please find my answers below. > > +/* Output "rep; mov" or "rep; stos" instruction depending on ISSETMEM > > argument. > > + When ISSETMEM is true, arguments SRCMEM and SRCPTR are ignored. > > + When ISSETMEM is false, arguments VALUE and ORIG_VALUE are ignored

Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

2013-10-21 Thread Michael V. Zolotukhin
> The patch is OK. Thanks, the patch was committed. > > That's a good point. I added a check for this case - so if CONST_INT is > > passed > > we assume that mode is QI. But usually promoted value resides in a > > register, so > > we don't go over-conservative here. > Hmm, so if we use broadca

Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

2013-10-21 Thread Michael V. Zolotukhin
> OK, I merged in my misaligned prologues changes and will post patch after full > testing. It seemed to go seamlessly. I spotted there are still few places > for > cleanup, so i will try to handle there incrementally. Great! Hope these recent memcpy/memset changes will lead to some gains soon

Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-12-03 Thread Michael V. Zolotukhin
Hi Bernd, I am working on offloading support for OpenMP4, so I'll try to share my vision of how everything works and answer your questions. GCC compiles host version of the code (as usual) and dumps Gimple, as it does for LTO, but for offloading. Gimple IR is stored only for functions/variables

Re: [GOMP4] Generation tables with omp-functions addresses for offloading.

2013-12-05 Thread Michael V. Zolotukhin
Ping. On 19 Nov 12:33, Michael V. Zolotukhin wrote: > Hi Jakub, > > Thanks for the remarks. Updated patch is attached, and my answers are below. > > > This will add into the table all "omp declare target" functions, but you > > actually want there only the

[RFC][gomp4] Offloading patches (1/3): Add '-fopenmp_target' option

2013-12-17 Thread Michael V. Zolotukhin
Hi everybody, Here is a set of patches implementing one more piece of offloading support in GCC. These three patches allow to build a host binary with target image and all tables embedded. Along with patches for libgomp and libgomp plugin, which hopefully will be sent soon, that gives a function

[RFC][gomp4] Offloading patches (2/3): Add tables generation

2013-12-17 Thread Michael V. Zolotukhin
Hi everybody, Here is a patch 2/3: Add tables generation. This patch is just a slightly modified patch sent a couple of weeks ago. When compiling with '-fopenmp' compiler generates a special symbol, containing addresses and sizes of globals/omp_fn-functions, and places it into a special section.

[RFC][gomp4] Offloading patches (3/3): Add invocation of target compiler

2013-12-17 Thread Michael V. Zolotukhin
Hi everybody, Here is a patch 3/3: Add invocation of target compiler. With this patch lto-wrapper performs invocation of target compilers and embeds the resultant target images into the host binary. The targets and the corresponding compilers are supposed to be specified in a special environment

Re: [RFC][gomp4] Offloading patches (3/3): Add invocation of target compiler

2013-12-20 Thread Michael V. Zolotukhin
> This patch seems to make rather too many assumptions about host and > target compilers. Certainly code like this can't go into > target-independent code like lto-wrapper. That's true. The point of this patch was to show what is needed to support x86->MIC OpenMP offloading, as we currently see it

[GOMP4] Generation tables with omp-functions addresses for offloading.

2013-11-15 Thread Michael V. Zolotukhin
Hi Jakub, Here is a patch for generation of tables containing omp-functions addresses. It is just a first step, as it lacks generation of similar tables for globals, but having it in the branch would ease our further development, as we would be able to base on this. This patch introduces new func

Re: [GOMP4] Generation tables with omp-functions addresses for offloading.

2013-11-19 Thread Michael V. Zolotukhin
Hi Jakub, Thanks for the remarks. Updated patch is attached, and my answers are below. > This will add into the table all "omp declare target" functions, but you > actually want there only the outlined #pragma omp target bodies. > The question is how to find them here reliably. At least ignorin

Re: New parameters to control stringop expansion libcall strategy

2013-08-05 Thread Michael V. Zolotukhin
Hi, This is a really convenient option, thanks for working on it. I can't approve it as I'm not a maintainer, but it looks ok to me, except fot a small nitpicking: afair, comments should end with dot-space-space. Michael On 04 Aug 20:01, Xinliang David Li wrote: > The attached is a new patch impl