Hi, On 2022-08-07 14:47:36 +0700, John Naylor wrote: > Playing around with the compiler flags on preproc.c, I get these > compile times, gcc memory usage as reported by /usr/bin/time -v , and > symbol sizes (non-debug build): > > -O2: > time 8.0s > Maximum resident set size (kbytes): 255884 > > -O1: > time 6.3s > Maximum resident set size (kbytes): 170636 > 000000000004d8e2 yytable > 000000000004d8e2 yycheck > 00000000000292de base_yyparse > > -O0: > time 2.9s > Maximum resident set size (kbytes): 153148 > 000000000004d8e2 yytable > 000000000004d8e2 yycheck > 000000000003585e base_yyparse > > Note that -O0 bloats the binary probably because it's not using a jump > table anymore. O1 might be worth it just to reduce build times for > slower animals, even if Noah reported this didn't help the issue > upthread. I suspect it wouldn't slow down production use much since > the output needs to be compiled anyway.
FWIW, I noticed that the build was much slower on gcc 12 than 11, and reported that as a bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106809 Which, impressively promptly, got a workaround in the development branch, and will (based on past experience) likely be backported to the 12 branch soon. Looks like the next set of minor releases will have at least a workaround for that slowdown. It's less clear to me if they're going to backport anyting about the -On regression starting in gcc 9. If I understand correctly the problem is due to basic blocks reached from a lot of different places. Not too hard to see how that's a problem particularly for preproc.c. It's worth noting that clang is also very slow, starting at -O1. Albeit in a very different place: ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 9.8708 seconds (9.8716 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- ... 7.1019 ( 72.7%) 0.0435 ( 40.8%) 7.1454 ( 72.4%) 7.1462 ( 72.4%) Greedy Register Allocator There's lots of code in ecpg like the following: c_anything: ecpg_ident { $$ = $1; } | Iconst { $$ = $1; } | ecpg_fconst { $$ = $1; } | ecpg_sconst { $$ = $1; } | '*' { $$ = mm_strdup("*"); } | '+' { $$ = mm_strdup("+"); } | '-' { $$ = mm_strdup("-"); } | '/' { $$ = mm_strdup("/"); } ... | UNION { $$ = mm_strdup("union"); } | VARCHAR { $$ = mm_strdup("varchar"); } | '[' { $$ = mm_strdup("["); } | ']' { $$ = mm_strdup("]"); } | '=' { $$ = mm_strdup("="); } | ':' { $$ = mm_strdup(":"); } ; I wonder if that couldn't be done smarter pretty easily. Not immediately sure if we can just get the string matching a keyword from the lexer? But even if not, replacing all the branches with a single lookup table of the keyword->string. Seems that could reduce the number of switch cases and parser states a decent amount. I also wonder if we shouldn't just make ecpg optional at some point. Or even move it out of the tree. Greetings, Andres Freund