> The patch miscompiles the MPFR library on x86 Pentium Pro. Reduced testcase
> attached, compile for x86 with -mtune=pentiumpro.
Thanks, I'll look at that in the nearest future.
Best regards, Michael
Hi HJ,
You were right, and the change from epilogue_size_needed to size_needed was the
rootcause of the bug.
Here is the obvious fix for that, regtested and bootstrapped on i386/x86_64.
Is it OK for trunk?
Changelog:
2013-09-09 Michael Zolotukhin
* config/i386/i386.c (ix86_expand_movm
> OK,
> Would you mind adding a testcase?
Thanks, here is the patch with Eric's test.
OK to commit?
Changelog:
gcc:
2013-09-09 Michael Zolotukhin
* config/i386/i386.c (ix86_expand_movmem): Fix epilogue generation.
gcc/testsuite:
2013-09-09 Michael Zolotukhin
* gcc.target/i
> I don't see anything i386 specific on the testcase, except the flags,
> and don't see why you need -fno-common in there, there are no global vars.
> So, I think it would be better to stick it into gcc.dg/torture/, drop
> dg-require-* line and instead just add
> /* { dg-additional-options "-march=
> I think it is worthwhile, various targets have many different ways to expand
> memcpy, admittedly i?86/x86_64 probably the biggest number of these, and
> while right now you've encountered it on ia32 with certain options doesn't
> mean that in a few years it couldn't hit some unrelated target, ar
> Please don't introduce new *.x files, for tests where you need something
> like that just stick it into gcc.dg/torture/ instead and use normal dg stuff
> in there.
Thanks, fixed. Ok to commit?
Changelog:
gcc:
2013-09-09 Michael Zolotukhin
* config/i386/i386.c (ix86_expand_movmem): F
> No -march in runtime tests, please.
Is mtune ok here?
Michael
> Uros.
> > Is mtune ok here?
> Yes.
Thanks, fixed. Ok to commit?
I verified that the test fails without fix in i386.c and passes with it.
Changelog:
gcc:
2013-09-09 Michael Zolotukhin
* config/i386/i386.c (ix86_expand_movmem): Fix epilogue generation.
gcc/testsuite:
2013-09-09 Michael Zolo
Hi Jakub,
This patch looks ok for me in general, but I am a bit worried about using
splay-trees. Couldn't we end up with their worst case linear performance
instead desired log?
Imagine the following scenario:
#pragma parallel ... // to produce N-threads
{
# pragma target map (i1, i2, ...i
> Libgomp will start N-1 new threads, and all of them would want to look up
> mappings for i1,i2,...iK in the splay tree. The first one wouldn't find
> anything and would map and insert all the values to the tree. But the
> following
> ones would look-up these addresses in the exactly same order
> Yes, splay tree can get totally unbalanced and you can have a linear lookup
> time, the O(log n) lookup time is amortized. But, if you e.g. really do
> lookup sorted keys (which is not given, the compiler puts vars into the
> clauses based on the user order or in the order references to those va
> No. If you insert 1 to 100 into a splay tree in ascending order (that will
> give you a totally unbalanced tree), and then lookup 1 to 100 in the
> ascending order again, then the lookup of 1 will be expensive (100
> comparisons), but then for each following lookup it
> will cost just 2 comparis
Hi Jakub,
I merged my patch with recent changes in gomp4-branch, and the new version is
below. Also, I fixed most of your remarks - the one isn't fixed is checking
sizeof(void*)==sizeof(uintptr_t) in configure. I'll do it in the next patch.
Is it ok for gomp4-branch?
Also, I was thinking of ho
> As discussed earlier, I'd like to pass __OPENMP_TARGET__ argument to
> all of GOMP_target{,_data,_update}, so that all those functions
> can get at the offloading data section in the shared library or binary
> making the call, so that the first time they encounter such a call
> in the shared libr
Hi Jakub,
Thanks for the explanation, it's getting a bit clearer, though I still have some
questions.
> __OPENMP_TARGET__ would be a linker plugin inserted symbol at the start of
> some linker plugin created data section, which would start with some header
> and then data.
> Say
> uleb128 number_
Hi Jakub,
Updated patch and my answers are below.
> The OpenMP standard has the omp_is_initial_device () function that can be
> used to query whether the code is offloaded or not. So I don't think we
> need to do the logging. For the device 257 hack we of course don't return
> that as true, but
Hi Jan,
> I also think the misaligned move trick can/should be performed by
> move_by_pieces and we ought to consider sane use of SSE - current vector_loop
> with unrolling factor of 4 seems bit extreme. At least buldozer is happy with
> 2 and I would expect SSE moves to be especially useful for
Hi Jan,
Here is a patch we've talked about recently - it merges expanders of memset and
memmov. As a natural side effect, this enables vector_loop in memset expanding
as well.
Though in some places merging movmem and setmem isn't so efficient (the original
code in these versions differed a lot),
Hi Jan,
Thanks for the review, please find my answers below.
> > +/* Output "rep; mov" or "rep; stos" instruction depending on ISSETMEM
> > argument.
> > + When ISSETMEM is true, arguments SRCMEM and SRCPTR are ignored.
> > + When ISSETMEM is false, arguments VALUE and ORIG_VALUE are ignored
> The patch is OK.
Thanks, the patch was committed.
> > That's a good point. I added a check for this case - so if CONST_INT is
> > passed
> > we assume that mode is QI. But usually promoted value resides in a
> > register, so
> > we don't go over-conservative here.
> Hmm, so if we use broadca
> OK, I merged in my misaligned prologues changes and will post patch after full
> testing. It seemed to go seamlessly. I spotted there are still few places
> for
> cleanup, so i will try to handle there incrementally.
Great! Hope these recent memcpy/memset changes will lead to some gains soon
Hi Bernd,
I am working on offloading support for OpenMP4, so I'll try to share my vision
of how everything works and answer your questions.
GCC compiles host version of the code (as usual) and dumps Gimple, as it does
for LTO, but for offloading. Gimple IR is stored only for functions/variables
Ping.
On 19 Nov 12:33, Michael V. Zolotukhin wrote:
> Hi Jakub,
>
> Thanks for the remarks. Updated patch is attached, and my answers are below.
>
> > This will add into the table all "omp declare target" functions, but you
> > actually want there only the
Hi everybody,
Here is a set of patches implementing one more piece of offloading support in
GCC. These three patches allow to build a host binary with target image and all
tables embedded. Along with patches for libgomp and libgomp plugin, which
hopefully will be sent soon, that gives a function
Hi everybody,
Here is a patch 2/3: Add tables generation.
This patch is just a slightly modified patch sent a couple of weeks ago. When
compiling with '-fopenmp' compiler generates a special symbol, containing
addresses and sizes of globals/omp_fn-functions, and places it into a special
section.
Hi everybody,
Here is a patch 3/3: Add invocation of target compiler.
With this patch lto-wrapper performs invocation of target compilers and embeds
the resultant target images into the host binary. The targets and the
corresponding compilers are supposed to be specified in a special environment
> This patch seems to make rather too many assumptions about host and
> target compilers. Certainly code like this can't go into
> target-independent code like lto-wrapper.
That's true. The point of this patch was to show what is needed to support
x86->MIC OpenMP offloading, as we currently see it
Hi Jakub,
Here is a patch for generation of tables containing omp-functions addresses.
It is just a first step, as it lacks generation of similar tables for globals,
but having it in the branch would ease our further development, as we would be
able to base on this.
This patch introduces new func
Hi Jakub,
Thanks for the remarks. Updated patch is attached, and my answers are below.
> This will add into the table all "omp declare target" functions, but you
> actually want there only the outlined #pragma omp target bodies.
> The question is how to find them here reliably. At least ignorin
Hi,
This is a really convenient option, thanks for working on it.
I can't approve it as I'm not a maintainer, but it looks ok to me,
except fot a small nitpicking: afair, comments should end with
dot-space-space.
Michael
On 04 Aug 20:01, Xinliang David Li wrote:
> The attached is a new patch impl
30 matches
Mail list logo