rcox2 added a comment. Actually, the Intel compiler distinguishes between an optimization report (-qopt-report) and an annotated listing (-qopt-report-annotate). The optimization report lists the info for optimizations in a hierarchical fashion. To use you example,
icc -c -O3 -qopt-report=1 -qopt-report-file=stderr v.c yields: Report from: Interprocedural optimizations [ipo] INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 20 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 Begin optimization report for: foo() Report from: Interprocedural optimizations [ipo] INLINE REPORT: (foo()) [1] v.c(2,12) Report from: Code generation optimizations [cg] v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] v.c:2 Hardware registers Reserved : 1[ esp] Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7] Callee-save : 4[ ebx ebp esi edi] Assigned : 0[ reg_null] Routine temporaries Total : 4 Global : 0 Local : 4 Regenerable : 0 Spilled : 0 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. Begin optimization report for: Test(int *, int *, int *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] v.c(4,52) -> INLINE: (16,3) foo() -> INLINE: (18,3) foo() -> INLINE: (18,17) foo() Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at v.c(8,8) <Peeled loop for vectorization> LOOP END LOOP BEGIN at v.c(8,8) remark #15301: SIMD LOOP WAS VECTORIZED LOOP END LOOP BEGIN at v.c(8,8) <Alternate Alignment Vectorized Loop> LOOP END LOOP BEGIN at v.c(8,8) <Remainder loop for vectorization> remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override LOOP END LOOP BEGIN at v.c(12,3) remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5) remark #25436: completely unrolled by 16 LOOP END Report from: Code generation optimizations [cg] v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] v.c:4 Hardware registers Reserved : 1[ esp] Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7] Callee-save : 4[ ebx ebp esi edi] Assigned : 15[ eax edx ecx ebx ebp esi edi zmm0-zmm7] Routine temporaries Total : 123 Global : 47 Local : 76 Regenerable : 5 Spilled : 6 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 8 bytes* Reads : 5 [1.41e+01 ~ 1.4%] Writes : 3 [3.00e+00 ~ 0.3%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. while the annotated listing looks like: // // ------- Annotated listing with optimization reports for "/export/iusers/rcox2/rgHF/v.c" ------- // //INLINING OPTION VALUES: // -inline-factor: 100 // -inline-min-size: 20 // -inline-max-size: 230 // -inline-max-total-size: 2000 // -inline-max-per-routine: 10000 // -inline-max-per-compile: 500000 // 1 void bar(); 2 void foo() { bar(); } //INLINE REPORT: (foo()) [1] /export/iusers/rcox2/rgHF/v.c(2,12) // ///export/iusers/rcox2/rgHF/v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] /export/iusers/rcox2/rgHF/v.c:2 // // Hardware registers // Reserved : 1[ esp] // Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7] // Callee-save : 4[ ebx ebp esi edi] // Assigned : 0[ reg_null] // // Routine temporaries // Total : 4 // Global : 0 // Local : 4 // Regenerable : 0 // Spilled : 0 // // Routine stack // Variables : 0 bytes* // Reads : 0 [0.00e+00 ~ 0.0%] // Writes : 0 [0.00e+00 ~ 0.0%] // Spills : 0 bytes* // Reads : 0 [0.00e+00 ~ 0.0%] // Writes : 0 [0.00e+00 ~ 0.0%] // // Notes // // *Non-overlapping variables and spills may share stack space, // so the total stack size might be less than this. // // 3 4 void Test(int *res, int *c, int *d, int *p, int n) { //INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] /export/iusers/rcox2/rgHF/v.c(4,52) // -> INLINE: (16,3) foo() // -> INLINE: (18,3) foo() // -> INLINE: (18,17) foo() // ///export/iusers/rcox2/rgHF/v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] /export/iusers/rcox2/rgHF/v.c:4 // // Hardware registers // Reserved : 1[ esp] // Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7] // Callee-save : 4[ ebx ebp esi edi] // Assigned : 15[ eax edx ecx ebx ebp esi edi zmm0-zmm7] // // Routine temporaries // Total : 123 // Global : 47 // Local : 76 // Regenerable : 5 // Spilled : 6 // // Routine stack // Variables : 0 bytes* // Reads : 0 [0.00e+00 ~ 0.0%] // Writes : 0 [0.00e+00 ~ 0.0%] // Spills : 8 bytes* // Reads : 5 [1.41e+01 ~ 1.4%] // Writes : 3 [3.00e+00 ~ 0.3%] // // Notes // // *Non-overlapping variables and spills may share stack space, // so the total stack size might be less than this. // // 5 int i; 6 7 #pragma simd 8 for (i = 0; i < 1600; i++) { // //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8) //<Peeled loop for vectorization> //LOOP END // //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8) // remark #15301: SIMD LOOP WAS VECTORIZED //LOOP END // //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8) //<Alternate Alignment Vectorized Loop> //LOOP END // //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8) //<Remainder loop for vectorization> // remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override //LOOP END 9 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; 10 } 11 12 for (i = 0; i < 16; i++) { // //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(12,3) // remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details // remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5) // remark #25436: completely unrolled by 16 //LOOP END 13 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; 14 } 15 16 foo(); 17 18 foo(); bar(); foo(); 19 } essentially, various parts of the optimization report are inserted into a listing at the appropriate line numbers. (Note that this is just the default level. More detail can be obtained with -qopt-report=X where X>1 (up to 5 is supported)). I believe what Hal is proposing in this patch is a very useful light-weight annotation of the source with key information. But I also believe that there is value for a stand-alone opt report with the kind of detailed information I presented in http://reviews.llvm.org/D19397 and the two follow up patches. In general, while this info can be interspersed in the source listing, I believe that for most purposes it is a bit too "busy" in text form. (The Intel compiler also supports annotated html and functionality that feeds into Visual Studio that has received great reviews.) http://reviews.llvm.org/D19678 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits