RE: New x86-64 micro-architecture levels
[AMD Public Use] Hi Floarian, > I'm including a proposal for the levels below. I use single letters for > them, but I expect that the concrete implementation of this proposal will use > names like “x86-100”, “x86-101”, like in the glibc patch referenced above. > (But we can discuss other approaches.) Personally I am not a big fan of this, for 2 reasons 1. uses just x86 in name on x86_64 as well 2. 100/101 not very intuitive > * Level A ... > * Level B > This step is so small that it probably can be dropped, unless the benefits > from using VEX encoding are truly significant. Yes, Agree, the delta is too small, can be clubbed into A or C. > * Level C > * Level D Others are inline with the what we expect as logical grouping. As you mentioned it is not easy tackle this, Also we would also like to have dynamic loader support for "zen" / "zen2" as a version of "Level D" and takes preference over Level D, which may have super-optimized libraries from AMD or other vendors. These libraries are expected to be optimized according to micro-architectural details, not just ISA. Probably we can discuss this on the hwcaps thread. -Prem
Re: New x86-64 micro-architecture levels
* Premachandra Mallappa: > [AMD Public Use] > > Hi Floarian, > >> I'm including a proposal for the levels below. I use single letters for >> them, but I expect that the concrete implementation of this proposal will >> use >> names like “x86-100”, “x86-101”, like in the glibc patch referenced above. >> (But we can discuss other approaches.) > > Personally I am not a big fan of this, for 2 reasons > 1. uses just x86 in name on x86_64 as well That's deliberate, so that we can use the same x86-* names for 32-bit library selection (once we define matching micro-architecture levels there). GCC has -m32 -march=x86-64 for K8 without 3DNow! (essentially the shared x86-64/EMT64 baseline), but I find this a bit confusing. > 2. 100/101 not very intuitive Any suggestions? The advantage is that these numbers show a strong preference ordering. They do make in false suggestions about feature sets: if we named Level C "x86-avx2", it would still be wrong for glibc to load libraries found in that directory just because a system has AVX2 support, because the libraries might also need FMA, based on the Level C definition). On the GCC side, it avoids a confusion between -mavx2 and -march=x86-avx2. If numbers are out, what should we use instead? x86-sse4, x86-avx2, x86-avx512? Would that work? >> * Level A > ... >> * Level B >> This step is so small that it probably can be dropped, unless the benefits >> from using VEX encoding are truly significant. > > Yes, Agree, the delta is too small, can be clubbed into A or C. Let's merge Level B into level C then? >> * Level C >> * Level D > > Others are inline with the what we expect as logical grouping. Thanks. > Also we would also like to have dynamic loader support for "zen" / > "zen2" as a version of "Level D" and takes preference over Level D, > which may have super-optimized libraries from AMD or other vendors. *That* shouldn't be too hard to implement if we can nail down the selection criteria. Let's call this Zen-specific Level C x86-zen-avx2 for the sake of exposition. What's going to be difficult is the choice for a hypothetical Zen successor that's compatible feature-flag-wise with Level D. Basically, there are two choices here: * Level D wins because it's the more powerful ISA. * x86-zen-avx2 wins because it has the Zen architecture optimizations. There's also a related issue with Level C vs x86-zen-avx2 depending on how we implement the Zen detection for AMD family numbers in the glibc dynamic linker. What I mean by this? glibc detects that this a Level C capable Zen-type CPU, but it's not one of the family/model numbers that were hard-coded into the glibc sources. What should we do then? Should we still prefer the x86-zen-avx2 library over the Level C library? > These libraries are expected to be optimized according to > micro-architectural details, not just ISA. If it's supposed to be generally useful, we really need to document the selection criteria for the subdirectory and make sure that it matches what these libraries actually require at run time in terms of ISA. I want to avoid two things here specifically: A hardware upgrade results in crashes because we incorrectly load an incompatible library. And, if possible: A hardware upgrade (or kernel/hypervisor upgrade that exposes more of the actual hardware) causes us to drop optimizations, so that users experience a performance regression. With the levels I proposed, these aspects are covered. But if we start to create vendor-specific forks in the feature progression, things get complicated. Do you think we need to figure this out in this iteration? If yes, then I really need a semi-formal description of the selection criteria for this x86-zen-avx2 directory, so that I can passed it along with my psABI proposal. Thanks, Florian
Three issues
Some background: This is in the dreaded structure reorganization optimization that I'm working on. It's running at LTRANS time with '-flto-partition=one'. My issues in order of importance are: 1) In gimple-ssa.h, the equal method for ssa_name_hasher has a segfault because the "var" field of "a" is (nil). struct ssa_name_hasher : ggc_ptr_hash { /* Hash a tree in a uid_decl_map. */ static hashval_t hash (tree item) { return item->ssa_name.var->decl_minimal.uid; } /* Return true if the DECL_UID in both trees are equal. */ static bool equal (tree a, tree b) { return (a->ssa_name.var->decl_minimal.uid == b->ssa_name.var->decl_minimal.uid); } }; The parameter "a" is associated with "*entry" on the 2nd to last line shown (it's trimmed off after that.) This from hash-table.h: template class Allocator> typename hash_table::value_type & hash_table ::find_with_hash (const compare_type &comparable, hashval_t hash) { m_searches++; size_t size = m_size; hashval_t index = hash_table_mod1 (hash, m_size_prime_index); if (Lazy && m_entries == NULL) m_entries = alloc_entries (size); #if CHECKING_P if (m_sanitize_eq_and_hash) verify (comparable, hash); #endif value_type *entry = &m_entries[index]; if (is_empty (*entry) || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable))) return *entry; . . Is there any way this could happen other than by a memory corruption of some kind? This is a show stopper for me and I really need some help on this issue. 2) I tried to dump out all the gimple in the following way at the very beginning of my program: void print_program ( FILE *file, int leading_space ) { struct cgraph_node *node; fprintf ( file, "%*sProgram:\n", leading_space, ""); // Print Global Decls // varpool_node *var; FOR_EACH_VARIABLE ( var) { tree decl = var->decl; fprintf ( file, "%*s", leading_space, ""); print_generic_decl ( file, decl, (dump_flags_t)0); fprintf ( file, "\n"); } FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node) { struct function *func = DECL_STRUCT_FUNCTION ( node->decl); dump_function_header ( file, func->decl, (dump_flags_t)0); dump_function_to_file ( func->decl, file, (dump_flags_t)0); } } When I run this the first two (out of three) functions print just fine. However, for the third, func->decl is (nil) and it segfaults. Now the really odd thing is that this works perfectly at the end or middle of my optimization. What gives? 3) For my bug in (1) I got so distraught that I ran valgrind which in my experience is an act of desperation for compilers. None of the errors it spotted are associated with my optimization (although it oh so cleverly pointed out the segfault) however it showed the following: ==18572== Invalid read of size 8 ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550) ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929) ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786) ==18572==by 0x9915A9: lto_main() (lto.c:653) ==18572==by 0x11EE4A0: compile_file() (toplev.c:458) ==18572==by 0x11F1888: do_compile() (toplev.c:2302) ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441) ==18572==by 0x23C021E: main (main.c:39) ==18572== Address 0x5842880 is 16 bytes before a block of size 88 alloc'd ==18572==at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*) (ipa-prototype.c:329) ==18572==by 0x106E987: gcc::pass_manager::pass_manager(gcc::context*) (pass-instances.def:178) ==18572==by 0x11EFCE8: general_init(char const*, bool) (toplev.c:1250) ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391) ==18572==by 0x23C021E: main (main.c:39) ==18572== Are these known issues with lto or is this a valgrind issue? Thanks, Gary CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Ampere Computing or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. Any review, copying, or distribution of this email (or any attachments thereto) is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.
Re: New x86-64 micro-architecture levels
I fully agree these names (100/101, A/B/C/D) are not very intuitive, I recommend using isa tags by year (e.g. x64_2010, x64_2014) like the python's platform tags (e.g. manylinux2010, manylinux2014).
Loading plugins with arm-none-eabi-gcc
Hello, I am currently trying to migrate a gcc plugin that has been well developed for x86 code to ARM platform (for arm-none-eabi-gcc). Currently I did the following steps: 1. write a hello world program t.c 2. compile with the following commands: ➜ arm-none-eabi-gcc -v .. gcc version 9.3.1 20200408 (release) (GNU Arm Embedded Toolchain 9-2020-q2-update) ➜ arm-none-eabi-gcc -S -mcpu=cortex-m3 -mthumb -fdump-tree-all t.c It works fine, and can smoothly print out all gimple code at different stages. 3. Load my plugin (the plugin is compiled by x64 gcc version 10.0): ➜ file instrument_san_cov.so instrument_san_cov.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped ➜ file arm-none-eabi-gcc arm-none-eabi-gcc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.24, BuildID[sha1]=fbadd6adc8607f595caeccae919f3bab9df2d7a6, stripped ➜ arm-none-eabi-gcc -fplugin=./instrument_cov.so -S -mcpu=cortex-m3 -mthumb -fdump-tree-all t.c cc1: error: cannot load plugin ./instrument_cov.so ./instrument_cov.so: undefined symbol: _Z20build_string_literaliPKcP9tree_nodem ➜ c++filt -n _Z20build_string_literaliPKcP9tree_nodem build_string_literal(int, char const*, tree_node*, unsigned long) It seems that somewhat a function named `build_string_literal` cannot be found. Why is that? I have no idea how to proceed on this matter and cannot find some proper documents. Any suggestion would be appreciated. Thank you! Best, Shuai