Re: Insn canonicalization not only with constant
Hi Andrew, You mean using a DI rotate left by 4 and then saving the output as SI (saving the hi part and ignoring the low one) ? Also, how is canonicalization detected anyway? Are there rules that gcc follows? How can they be changed? Sami Andrew Pinski wrote: output = (operand1 >> 28) | (operand2 << 4) Isn't that a rotate? if so you can use either rotate or rotatert instead.
maybe vectorizer-bug regarding unhandled data-ref
Hi, while playing with gcc-4.3 rev. 121994, i encountered a problem with autovectorisation. In the following simple code, the inner loop of c1() becomes vectorized as expected, but the inner loop of c2() not because of test2.c:15: note: = analyze_loop_nest = test2.c:15: note: === vect_analyze_loop_form === test2.c:15: note: === get_loop_niters === test2.c:15: note: ==> get_loop_niters:(unsigned int) n_6(D) test2.c:15: note: Symbolic number of iterations is (unsigned int) n_6(D) test2.c:15: note: === vect_analyze_data_refs === test2.c:15: note: get vectype with 4 units of type float test2.c:15: note: vectype: vector float test2.c:15: note: not vectorized: unhandled data-ref test2.c:15: note: bad data references. (even with -ftree-vectorizer-verbose=99 there is no more info than that) The only difference between the two functions is that in c1() static arrays are used and in c2() pointer to arrays.. Is this a problem with aliasing/alignment of pointer parameters or a vectorizer bug? And is there a work-around? Best regards, Thomas -- float a[256],b[16],o[271]; void c1() { for(int i=0;i<256;i++) { for(int j=0;j<16;j++) { o[i+j]+=a[i]*b[j]; } } } void c2(int m, int n, float *a, float *b, float *o) { for(int i=0;i
Re: maybe vectorizer-bug regarding unhandled data-ref
> Hi, > > while playing with gcc-4.3 rev. 121994, i encountered a problem with > autovectorisation. > > In the following simple code, the inner loop of c1() becomes vectorized as > expected, but the inner loop of c2() not because of > >test2.c:15: note: = analyze_loop_nest = >test2.c:15: note: === vect_analyze_loop_form === >test2.c:15: note: === get_loop_niters === >test2.c:15: note: ==> get_loop_niters:(unsigned int) n_6(D) >test2.c:15: note: Symbolic number of iterations is (unsigned int) n_6(D) >test2.c:15: note: === vect_analyze_data_refs === > >test2.c:15: note: get vectype with 4 units of type float >test2.c:15: note: vectype: vector float >test2.c:15: note: not vectorized: unhandled data-ref >test2.c:15: note: bad data references. > > (even with -ftree-vectorizer-verbose=99 there is no more info than that) > > The only difference between the two functions is that in c1() static > arrays are used and in c2() pointer to arrays.. Is this a problem with > aliasing/alignment of pointer parameters or a vectorizer bug? And is there > a work-around? > The first problem is that a[i] is invariant in the inner-loop, and the vectorizer wants to work only with data-references that have a nice evolution in the loop (i.e. advance between iterations of the loop). In other words - it assumes that invariant accesses had been moved out of the loop before vectorization: " ptr is loop invariant. create_data_ref: failed to create a dr for *pretmp.27_46 " The work around for that is to manually move the invariant a[i] out of the inner-loop, put it into a temporary, and use that temporary in the inner-loop. The second problem is aliasing - the vectorizer can't tell that the write through pointer o doesn't overlap with the read through pointer b. The work around for that is to add the "__restrict" qualifier to the declaration of the pointers. To fix the first problem in the compiler, we can teach the vectorizer to work with invariant datarefs. This is easy to do, but I think the right solution is to enhance loop-invariant-motion pass to use an aliasing oracle that would tell it that the invariant load can be safely moved out of the loop (given that the pointers are __restrict qualified). I think such a solution is in the works? Do people think it's worth while to work around this invariant-motion issue in the vectorizer? The second problem would be fixed in the near future - a patch that addds support for run-time aliasing checks is in the works (should be ready within a week or so I think). dorit > Best regards, > Thomas > > -- > > float a[256],b[16],o[271]; > > void c1() > { >for(int i=0;i<256;i++) { > for(int j=0;j<16;j++) { >o[i+j]+=a[i]*b[j]; > } >} > } > > void c2(int m, int n, float *a, float *b, float *o) > { >for(int i=0;i for(int j=0;jo[i+j]+=a[i]*b[j]; > } >} > }
Re: maybe vectorizer-bug regarding unhandled data-ref
On 2/15/07, Dorit Nuzman <[EMAIL PROTECTED]> wrote: > Hi, > > while playing with gcc-4.3 rev. 121994, i encountered a problem with > autovectorisation. > > In the following simple code, the inner loop of c1() becomes vectorized as > expected, but the inner loop of c2() not because of > >test2.c:15: note: = analyze_loop_nest = >test2.c:15: note: === vect_analyze_loop_form === >test2.c:15: note: === get_loop_niters === >test2.c:15: note: ==> get_loop_niters:(unsigned int) n_6(D) >test2.c:15: note: Symbolic number of iterations is (unsigned int) n_6(D) >test2.c:15: note: === vect_analyze_data_refs === > >test2.c:15: note: get vectype with 4 units of type float >test2.c:15: note: vectype: vector float >test2.c:15: note: not vectorized: unhandled data-ref >test2.c:15: note: bad data references. > > (even with -ftree-vectorizer-verbose=99 there is no more info than that) > > The only difference between the two functions is that in c1() static > arrays are used and in c2() pointer to arrays.. Is this a problem with > aliasing/alignment of pointer parameters or a vectorizer bug? And is there > a work-around? > The first problem is that a[i] is invariant in the inner-loop, and the vectorizer wants to work only with data-references that have a nice evolution in the loop (i.e. advance between iterations of the loop). In other words - it assumes that invariant accesses had been moved out of the loop before vectorization: " ptr is loop invariant. create_data_ref: failed to create a dr for *pretmp.27_46 " The work around for that is to manually move the invariant a[i] out of the inner-loop, put it into a temporary, and use that temporary in the inner-loop. The second problem is aliasing - the vectorizer can't tell that the write through pointer o doesn't overlap with the read through pointer b. The work around for that is to add the "__restrict" qualifier to the declaration of the pointers. To fix the first problem in the compiler, we can teach the vectorizer to work with invariant datarefs. This is easy to do, but I think the right solution is to enhance loop-invariant-motion pass to use an aliasing oracle that would tell it that the invariant load can be safely moved out of the loop (given that the pointers are __restrict qualified). I think such a solution is in the works? It is. Do people think it's worth while to work around this invariant-motion issue in the vectorizer? Probably not, it's just going to make your code more complex for no real gain.
Re: Insn canonicalization not only with constant
On Wed, Feb 14, 2007 at 08:30:52PM +, Sami Khawam wrote: > Hi Rask, > > Basically the CPU has the 'SCALE_28_4' instruction which does the following: > output = (operand1 >> 28) | (operand2 << 4) > > From my understanding the OR operation (ior), doesn't get canonicalized > since it's second operand (in this case (lshiftrt:SI (match_operand:SI 2 > "register_operand" "r") (const_int 4)) ) is not a constant. OK, I see what you mean. The reason you can get both (ior (ashift ...) (lshiftrt ...)) and (ior (lshiftrt ...) (ashift ...)) is that simplify-rtx.c has no rule to canonicalize such expressions and that LSHIFTRT and ASHIFT have the same precedence. Hmm, in simplify_binary_operation_1(), it says: /* Convert (ior (ashift A CX) (lshiftrt A CY)) where CX+CY equals the mode size to (rotate A CX). */ Right after that is code to make sure ASHIFT is the first operand for the simplification attempts that follow. You could try adding code to do this in general, but I don't know where such code should be added. Btw, I found this in rtlanal.c: /* Return a value indicating whether OP, an operand of a commutative operation, is preferred as the first or second operand. The higher the value, the stronger the preference for being the first operand. We use negative values to indicate a preference for the first operand and positive values for the second operand. */ int commutative_operand_precedence (rtx op) { enum rtx_code code = GET_CODE (op); /* Constants always come the second operand. Prefer "nice" constants. */ if (code == CONST_INT) return -7; [...] The comment disagrees with the code. -- Rask Ingemann Lambertsen
Re: Insn canonicalization not only with constant
OK, I see what you mean. The reason you can get both (ior (ashift ...) (lshiftrt ...)) and (ior (lshiftrt ...) (ashift ...)) is that simplify-rtx.c has no rule to canonicalize such expressions and that LSHIFTRT and ASHIFT have the same precedence. Hmm, in simplify_binary_operation_1(), it says: /* Convert (ior (ashift A CX) (lshiftrt A CY)) where CX+CY equals the mode size to (rotate A CX). */ ok, so that means that in that specific shift example I could go away with a rotate operation (even though it has to be of mode DI -> SI). Right after that is code to make sure ASHIFT is the first operand for the simplification attempts that follow. You could try adding code to do this in general, but I don't know where such code should be added. I will look more into this. It might be that there is no simple way to activate canonicalization for the general case (i.e. any insn that defined in the machine description), and maybe it has to be done to every specific type of operation. Btw, I found this in rtlanal.c: int commutative_operand_precedence (rtx op) > : : It seems like commutative_operand_precedence() is only used twice to swap operand1 and operand2 - so the fact that it returns low values (or high, since the comment in the code seems wrong) for general operands shouldn't affect the ability to canonicalize them. Sami
Re: [Autovect]dependencies of virtual defs/uses
"Jiahua He" <[EMAIL PROTECTED]> wrote on 12/02/2007 22:54:08: > Oh, I see. For reduction and induction, you don't need to deal with > the condition with vdef. I am considering how to implement an idiom > with vdef, like SCAN (prefix sum). And by the way, do you support > idioms with vuses? > You mean detecting this pattern?: for i a[i] += a[i-1]; I don't know if analyzing vdefs/vuses would help you much to detect this pattern - maybe you're better off computing the dependence-distance (i.e. use compute_data_dependences_for_loop, and look at DDR_DIST_VECTS). dorit > Jiahua > > > 2007/2/12, Dorit Nuzman <[EMAIL PROTECTED]>: > > > Thanks! In fact, I should ask how to deal with idiom (such as > > > reduction, induction) recognition for virtual defs/uses. > > > > > > > Just curious - what is this for? (are you interested in this in the context > > of vectorization? is there a specific example you have in mind?) > > > > dorit > > > > > Jiahua > > > > > > > > > 2007/2/12, Daniel Berlin <[EMAIL PROTECTED]>: > > > > On 2/12/07, Jiahua He <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > > > > > I am reading the code of autovect branch and curious about how to > > deal > > > > > with the dependencies of virtual defs/uses. In the function > > > > > vect_analyze_scalar_cycles( ), I found the statement "Skip virtual > > > > > phi's. The data dependences that are associated with virtual > > defs/uses > > > > > ( i.e., memory accesses) are analyzed elsewhere." But where is the > > > > > code? I tried to search for "vect_induction_def" and > > > > > "vect_reduction_def" and found that they are not used to assign > > > > > elsewhere. Is the analysis not implemented yet? Thanks in advance! > > > > > > > > They show up as data references because of tree-data-reference.c > > > marking them. > > > > At lets, that's how other linear loop transforms handles it. > > > > Not sure about how vectorizer deals with it specifically > > > > > > > > > > > > > > Jiahua > > > > > > > > > > > > >
Re: [Autovect]dependencies of virtual defs/uses
2007/2/15, Dorit Nuzman <[EMAIL PROTECTED]>: "Jiahua He" <[EMAIL PROTECTED]> wrote on 12/02/2007 22:54:08: > Oh, I see. For reduction and induction, you don't need to deal with > the condition with vdef. I am considering how to implement an idiom > with vdef, like SCAN (prefix sum). And by the way, do you support > idioms with vuses? > You mean detecting this pattern?: for i a[i] += a[i-1]; a[i] = a[i-1] + b[i] I don't know if analyzing vdefs/vuses would help you much to detect this pattern - maybe you're better off computing the dependence-distance (i.e. use compute_data_dependences_for_loop, and look at DDR_DIST_VECTS). Thinking in the same way. Jiahua dorit > Jiahua > > > 2007/2/12, Dorit Nuzman <[EMAIL PROTECTED]>: > > > Thanks! In fact, I should ask how to deal with idiom (such as > > > reduction, induction) recognition for virtual defs/uses. > > > > > > > Just curious - what is this for? (are you interested in this in the context > > of vectorization? is there a specific example you have in mind?) > > > > dorit > > > > > Jiahua > > > > > > > > > 2007/2/12, Daniel Berlin <[EMAIL PROTECTED]>: > > > > On 2/12/07, Jiahua He <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > > > > > I am reading the code of autovect branch and curious about how to > > deal > > > > > with the dependencies of virtual defs/uses. In the function > > > > > vect_analyze_scalar_cycles( ), I found the statement "Skip virtual > > > > > phi's. The data dependences that are associated with virtual > > defs/uses > > > > > ( i.e., memory accesses) are analyzed elsewhere." But where is the > > > > > code? I tried to search for "vect_induction_def" and > > > > > "vect_reduction_def" and found that they are not used to assign > > > > > elsewhere. Is the analysis not implemented yet? Thanks in advance! > > > > > > > > They show up as data references because of tree-data-reference.c > > > marking them. > > > > At lets, that's how other linear loop transforms handles it. > > > > Not sure about how vectorizer deals with it specifically > > > > > > > > > > > > > > Jiahua > > > > > > > > > > > > >
new port to older gcc: Toshiba Media Processor (MeP)
On behalf of Red Hat I would like to publish patches to add support for the Toshiba Media Processor (MeP) to GCC 3.4. We don't expect this port to be accepted into the gcc source tree as-is, as the 3.4 branch is closed to new ports, and this port needs some core gcc changes. We don't yet have a port to the gcc 4.x family. I have posted details, patches, and new files here: http://people.redhat.com/dj/mep/ The target is mep-elf. DJ Thanks,
what is difference between gcc-ada and GNAT????
hi, can any one tell me what is the difference between gcc-ada and differnt other compiler for Ada 95 like GNAT GPL, GNAT Pro, what is procedure to build only language Ada by using source code og gcc-4.1???
Makefile.def and fixincludes/Makefile.in inconsistency?
Why is it that Makefile.def includes: // "missing" indicates that that module doesn't supply // that recursive target in its Makefile. [...] host_modules= { module= fixincludes; missing= info; missing= dvi; missing= pdf; missing= TAGS; missing= install-info; missing= installcheck; }; when fixincludes/Makefile.in includes: dvi : pdf : info : html : install-html : installcheck : Am I correct in guessing that the "missing" lines in Makefile.def are not currently needed? Or are they merely present in the GCC fixincludes but missing in the fixincludes directories in some other trees that share the top-level build files? - Brooks