Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
Bytes allocated with O2: ----------------------------------------------------------------------------------------------------- Benchmark | upstream | with this PATCH ----------------------------------------------------------------------------------------------------- 400.perlbench | 25286185160 | 25176544846 ~0.0% 401.bzip2 | 1429883731 | 1391040027 -2.7% 403.gcc | 55023568981 | 54798890746 ~0.0% 429.mcf | 1360975660 | 1321537710 -2.9% 445.gobmk | 12791636502 | 12666523431 -1.0% 456.hmmer | 9354433652 | 9279189174 ~0.0% 458.sjeng | 1991260562 | 1944031904 -2.4% 462.libquantum | 1725112078 | 1684213981 -2.4% 464.h264ref | 8597673515 | 8528855778 ~0.0% 471.omnetpp | 37613034778 | 37432278047 ~0.0% 473.astar | 3817295518 | 3772460508 -1.2% 483.xalancbmk | 149418776991 | 148545162207 ~0.0% Bytes allocated with Ofast + funroll-loops: ------------------------------------------------------------------------------------------ Benchmark | upstream | with this PATCH ------------------------------------------------------------------------------------------ 400.perlbench | 30438407499 | 30574152897 ~0.0% 401.bzip2 | 2277114519 | 2319432664 +1.9% 403.gcc | 64499664264 | 64781232731 ~0.0% 429.mcf | 1361486758 | 1399942116 +2.8% 445.gobmk | 15258056111 | 15396801542 +1.0% 456.hmmer | 10896615649 | 10936223486 ~0.0% 458.sjeng | 2592620709 | 2641687496 +1.9% 462.libquantum | 1814487525 | 1854518500 +2.2% 464.h264ref | 13528736878 | 13614517066 ~0.0% 471.omnetpp | 38721066702 | 38910524667 ~0.0% 473.astar | 3924015756 | 3968057027 +1.1% 483.xalancbmk | 165897692838 | 166843885880 ~0.0% Pan -----Original Message----- From: Richard Biener <rguent...@suse.de> Sent: Friday, May 5, 2023 2:25 PM To: Li, Pan2 <pan2...@intel.com> Cc: 钟居哲 <juzhe.zh...@rivai.ai>; kito.cheng <kito.ch...@gmail.com>; richard.sandiford <richard.sandif...@arm.com>; Jeff Law <jeffreya...@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <pal...@dabbelt.com>; jakub <ja...@redhat.com> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Fri, 5 May 2023, Li, Pan2 wrote: > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes > for this change, target the SPEC 2006 INT part with rv64gcv. Note we only > count the bytes allocated from valgrind log like this "==2832896== total > heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > Consider some variance of valgrind, it looks like the impact to bytes > allocated may be limited. However, I am still running this for x86, it > will take more than 30 hours for each iteration... I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. Richard. > RISC-V GCC Version: > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR > PURPOSE. > > Bytes allocated with O2: > ----------------------------------------------------------------------------------------------------- > Benchmark | upstream | with this PATCH > ----------------------------------------------------------------------------------------------------- > 400.perlbench | 29699642875 | 29949876269 ~0.0% > 401.bzip2 | 1641041659 | 1755563972 +6.95% > 403.gcc | 68447500516 | 68900883291 ~0.0% > 429.mcf | 1433156462 | 1433253373 ~0.0% > 445.gobmk | 14239225210 | 14463438465 ~0.0% > 456.hmmer | 9635955623 | 9808534948 +1.8% > 458.sjeng | 2419478204 | 2545478940 +5.4% > 462.libquantum | 1686404489 | 1800884197 +6.8% > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > 473.astar | 3807097529 | 3928428183 +3.2% > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------ > Benchmark | upstream | with this PATCH > ------------------------------------------------------------------------------------------ > 400.perlbench | 39491184733 | 39223020267 ~0.0% > 401.bzip2 | 2843871517 | 2730383463 ~0% > 403.gcc | 84195991898 | 83730632955 -4.0% > 429.mcf | 1481381164 | 1367309565 -7.7% > 445.gobmk | 20123943663 | 19886116394 -1.2% > 456.hmmer | 12302445139 | 12121745383 -1.5% > 458.sjeng | 3884712615 | 3755481930 -3.3% > 462.libquantum | 1966619940 | 1852274342 -5.8% > 464.h264ref | 19219365552 | 19050288201 ~0.0% > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > 473.astar | 4118600354 | 3995943705 -3.0% > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > Pan > > > -----Original Message----- > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On > Behalf Of ??? > Sent: Thursday, April 13, 2023 7:23 AM > To: kito.cheng <kito.ch...@gmail.com>; rguenther <rguent...@suse.de> > Cc: richard.sandiford <richard.sandif...@arm.com>; Jeff Law > <jeffreya...@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer > <pal...@dabbelt.com>; jakub <ja...@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Yeah, like kito said. > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > And we like ARM SVE style implmentation. > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal > not exceed 64 bit. > But it seems that there is still problem in tree_type_common and > tree_decl_common, is that right? > > After several trys (remove all redundant TI/TF vector modes and FP16 vector > mode), now there are 252 modes in RISC-V port. Basically, I can keep > supporting new RVV intrinsisc features recently. > However, we can't support more in the future, for example, FP16 vector, BF16 > vector, matrix modes, VLS modes,...etc. > > From RVV side, I think extending 1 more bit of machine mode should be enough > for RVV (overal 512 modes). > Is it possible make it happen in tree_type_common and tree_decl_common, > Richards? > > Thank you so much for all comments. > > > juzhe.zh...@rivai.ai > > From: Kito Cheng > Date: 2023-04-12 17:31 > To: Richard Biener > CC: juzhe.zh...@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; > palmer; jakub > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > > The concept of fractional LMUL is the same as the concept of > > > AArch64's partial SVE vectors, so they can only access the lowest > > > part, like SVE's partial vector. > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > 1/8), so adding dedicated modes for those partial vector modes > > > should be unavoidable IMO. > > > > > > And even if we use sub-vector, we still need to define those > > > partial vector types. > > > > Could you use integer modes for the fractional vectors? > > You mean using the scalar integer mode like using (subreg:SI > (reg:VNx4SI) 0) to represent > LMUL=1/4? > (Assume VNx4SI is mode for M1) > > If so I think it might not be able to model that right - it seems like we are > using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > For computation you can always appropriately limit the LEN? > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to > guarantee the vector length is at least larger than N bits, but it's > just guarantee the minimal length like SVE guarantee the minimal > vector length is 128 bits > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)