On Tue, Jan 3, 2017 at 9:20 PM, FX <fxcoud...@gmail.com> wrote: >> The regression with 8 bit boolean types surfaced with the z10 machine. The >> ABI is much older. Since >> we cannot change it anymore we should rather talk to the hardware guys to >> add the instruction we >> need. So for now we probably have to live with the regression in the Fortran >> cases since as I >> understand it your change fixes an actual problem. > > As far as I understand (and Jane will correct me if I am wrong),
Sure, allow me to correct you: It's Janne, with two n in the middle. :) > the patch does not fix anything in particular. The idea was that, by > transitioning from having all boolean expressions from “int” to “bool” (in C > terms), and thus from 32-bit to 8-bit on “typical” targets, the optimizer > might be able to emit more compact code. My motivations were: - As you say, hopefully more compact code. - Fixing the co-array intrinsics which pass arguments as boolean_type_node and on the library side the corresponding arguments are C _Bool's. I think this has worked previously accidentally, as arguments are passed in registers anyway, and stack slots are 8 bytes(?) on x86-64. But of course, we shouldn't let this excuse us from actually fixing it. - Redefining an ABI type specified by the middle end is bad form, IMHO. > I am not sure this was tested. > > So: maybe it is a case of "Profile. Don't speculate.” I don't have access to spec2k6, but I just checked the polyhedron 2011 benchmark suite. Target x86_64-pc-linux-gnu, compile options -O3 -funroll-loops -ffast-math -ftree-loop-linear -march=native (native == amdfam10 here). The line counts of the .s files (~number of instructions) are (working directory is current trunk, gf_trunk_boolean_type_node is trunk with my patch reverted): 6856 ac.s 6856 ../gf_trunk_boolean_type_node/ac.s 235206 aermod.s 235337 ../gf_trunk_boolean_type_node/aermod.s 22310 air.s 22477 ../gf_trunk_boolean_type_node/air.s 15221 capacita.s 15252 ../gf_trunk_boolean_type_node/capacita.s 4905 channel2.s 4859 ../gf_trunk_boolean_type_node/channel2.s 34269 doduc.s 33610 ../gf_trunk_boolean_type_node/doduc.s 11963 fatigue2.s 11959 ../gf_trunk_boolean_type_node/fatigue2.s 17369 gas_dyn2.s 17304 ../gf_trunk_boolean_type_node/gas_dyn2.s 31098 induct2.s 31098 ../gf_trunk_boolean_type_node/induct2.s 8366 linpk.s 8367 ../gf_trunk_boolean_type_node/linpk.s 15139 mdbx.s 15155 ../gf_trunk_boolean_type_node/mdbx.s 5302 mp_prop_design.s 5314 ../gf_trunk_boolean_type_node/mp_prop_design.s 17420 nf.s 17457 ../gf_trunk_boolean_type_node/nf.s 20804 protein.s 20713 ../gf_trunk_boolean_type_node/protein.s 52476 rnflow.s 52449 ../gf_trunk_boolean_type_node/rnflow.s 37339 test_fpu2.s 37377 ../gf_trunk_boolean_type_node/test_fpu2.s 8466 tfft2.s 8448 ../gf_trunk_boolean_type_node/tfft2.s So 7 of 17 benchmarks are slightly bigger after the patch (at least for channel2 it seems due to trunk unrolling a loop which didn't happen with the patch reverted, haven't checked others), 8 benchmarks are slightly smaller after the patch, and 2 are equally long. Runtime differences seem to be a wash, though I didn't test that very carefully. -- Janne Blomqvist