On 1/21/21 2:46 PM, Jan Hubicka wrote:
I think easy way to get users of this option is to make profile not reproducible by default and modify packages to use right reproducibility option when reproducible builds are intended. It is not feature that comes for free and I think most users of PGO does not care, so I think it should be opt in.
I agree that most users really don't care.
In general getting profile reroducible one needs to make train reproducible that is hard when you look at details (such as/tmp/ file name generation issue in gcc) and may lead to need for user to annotate such code.
Yes, right now I'm testing both patches and I still see difference in GCC PGO bootstrap (with -fprofile-reproducible=parallel-runs): $ objfolderdiff.py /dev/shm/objdir2/gcc /dev/shm/objdir3/gcc 138/ 649: cgraphunit.o: different 230/ 649: dwarf2out.o: different 356/ 649: ipa-cp.o: different 357/ 649: ipa-devirt.o: different 360/ 649: ipa-icf.o: different 533/ 649: tree-affine.o: different 574/ 649: tree-ssa-loop-im.o: different 632/ 649: var-tracking.o: different Most of the changes are in known contexts: ;; Function hash_table<hash_map<im_mem_ref*, sm_aux*>::hash_entry, false, xcallocator>::find_with_hash (_ZN10hash_tableIN8hash_mapIP10im_mem_refP6sm_aux21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE14find_with_hashERKS2_j, funcdef_no=3431, decl_uid=106116, cgraph_uid=2554, symbol_order=2712) ;; Function hash_table<default_hash_traits<void*>, false, xcallocator>::find_empty_slot_for_expand (_ZN10hash_tableI19default_hash_traitsIPvELb0E11xcallocatorE26find_empty_slot_for_expandEj, funcdef_no=4875, decl_uid=134656, cgraph_uid=3901, symbol_order=4074) ;; Function hash_table_mod2 (_Z15hash_table_mod2jj, funcdef_no=1047, decl_uid=32691, cgraph_uid=345, symbol_order=356) ;; Function hash_table<hash_map<tree_node*, name_expansion*>::hash_entry, false, xcallocator>::find_empty_slot_for_expand (_ZN10hash_tableIN8hash_mapIP9tree_nodeP14name_expansion21simple_hashmap_traitsI19default_hash_traitsIS2_ES4_EE10hash_entryELb0E11xcallocatorE26find_empty_slot_for_expandEj, funcdef_no=3639, decl_uid=102478, cgraph_uid=2760, symbol_order=2916) So likely hashing related functions where we hash some pointers :/ Unfortunately, that's enough for final binary divergence.
This will become more common problem for multithreaded profiles where one needs to annotate locking and busy waiting loops in them for example (or the scheduler responsible for executing paralle tasks). I can see this to be practically achievable but we probably want to produce some guidelines for doing that and probably teach gcov-tool to compare profiles and say to which degree they match (i.e. which functions match for each of levels of reproducibility). The problem is that profiles are continuous and the errors too, but optimizaitons looks for certain thresholds, so small errors may lead to code changes, so I think our current method of looking at relatively few packages and patching errors when they appear is not very good long term strategy... Especially if it makes us to drop useful transformations by default with -fprofile-use and no additional option.
To be honest we have very few packages that use PGO in openSUSE:Factory. Anyway, are you fine with the suggested? Thanks for discussion, Martin