Hi, sorry for the delay. Couple of weeks ago, I have cloned the github repo, lookaed at the diff against the trunk and reviewed that, which resulted in the following pile of notes.
I hope they will be at least mostly understandable, if a little bit disorganized and perhaps sometimes inconsistent. I have been interrupted by a surprisingly high number of issues which required my quick attention and a disk failure during that period. - Still quite few things need to be documented better, e.g.: + brig_to_generic::get_mangled_name_tmpl and to a lesser extent brig_to_generic::get_mangled_name. It should be clear what is the intended difference in usage of the two (specially since the former is a template and so the parameter does not give that much of a hint to the reader) + the visitor classes need some description so that the first time readers see them, they understand what they are for and what they visit (i.e. what "visiting" even means). - I know it was me who told you to use gcc_assert and gcc_unreachable instead of internal_error. The thing is, often the error is clearly not an internal error but an error in input. I think that we should plan to handle these cases differently and report the issues better to the user, give a meaningful error message together width the section and the offset there when it was encountered. I am not asking you to audit all asserts now and convert those in this category but it would be nice to have a mechanism to easily do so (and convert a few obvious places), so that we can convert these as we bump into them. For asserts that do check for compiler errors in the sense that no matter what the input, a condition should never occur, please note that gcc_assert (condition) is preferred over if (!condition) gcc_unreachable (); and that gcc_unreachable is a noreturn call and thus there is no need to put any return statements after them. I'm not pointing occurrences of this in the review because many of them are actually handling input issues and those should probably eventually be calls to error() or some encapsulation of it and will probably need a bail-out followup. - A very minor suggestion: In GCC it is customary to write TODO as one word. We generally do not use "TO OPTIMIZE", that is just a TODO (as opposed to a FIXME, which hints that something is at least a bit wrong here). I think you can keep your way if you want but for example I do have emacs highlighting set up for the traditional formats. - You do not seem to handle BRIG_OPCODE_ICALL, or have I missed it? That is fine, I don't think anybody else does that now anyway, I just got curious reading the ipa analysis. - brig-lang.c: + x_flag_whole_program = 0; - talk about this with Honza + brig_langhook_type_for_size: has several issues. Either always return a type like go or return error_mark_node instead of NULL. Also, do not count on long being 64-bit. I would just copy what go does. Or lto. + brig_langhook_type_for_mode: Also please do not depend on knowing that long is 8 bytes, or int being 4 bytes long. For complex modes, the correct thing seems to be to return NULL instead of void_type_node. In any case, it would be better to return error_mark_node rather than void_type_node. + convert: did you avoid using convert_to_vector deliberately? The size check seems genuinely useful. - brigfrontend/brig-to-generic.cc: + brig_to_generic::parse: You seem to be handling LDA instruction (or more generally, BRIG_KIND_INST_ADDR instructions, but there is just the one) with copy_move_inst_handler, which seems just wrong. Its operator() blindly casts the instruction to BrigInstSourceType, interpreting the segment as if it was a source type... am I missing something? + build_reinterpret_cast: gcc_unreachable followed by return NULL_TREE does not make sense, please convert the whole thing to an assert. More importantly, I think that you really need to solve the case of mismatching sizes differently, and not create a (single) V_C_E for it. For register types, you can V_C_E them to an unsigned int of the same size, then do a widening NOP_EXPR to unsigned type with the same size as the destination and then do a V_C_E to whatever you need. But perhaps this can be solved better at the caller side somewhere? + brig_to_generic::finish_function: the ifdef'ed debug_functions should not be a part of the final submission. Checking m_cf->m_is_kernel twice looks ugly. + brig_to_generic::append_group_variable: Don't put a statement guarded with an if statement on the same line as the condition. I also believe align_padding, when the offset is not already aligned, should be: alignment - m_next_group_offset % alignment + brig_to_generic::append_private_variable: The same as above. + I would suggest that you change brig_to_generic::dump_function to a standalone function taking a file parameter. It is then much easier to call it for example from debugger. - brig-code-entry-handler.cc + brig_code_entry_handler::build_tree_operand: The unreachable return at the end looks misleading, just removing it should be fine. Ditto for the breaks after returns, the upcoming fall-through warning will notify us if we ever get a potentially wrong fall-through. + brig_code_entry_handler::get_tree_cst_for_hsa_operand: GCC coding standard mandates that if a branch has only one statement in it, it should not be encapsulated in braces. If, in the condition if (type != NULL && TREE_CODE (type) == ARRAY_TYPE) type can ever actually be NULL, then the function will segfault just before ending, when again checking whether type is an array. If it cannot be NULL, then please gcc_checking_assert it instead. + brig_code_entry_handler::build_address_operand: I must say I really dislike how the end of the function is structured, it is terrible difficult to read given that it is not doing anything that complicated and I think it does not handle correctly an (LDA of a) NULL address, which unfortunately I believe is valid for private and group segment addresses. Dealing with the most complex case by converting symbol to size_type looks exactly backwards, especially given that you have converted the base of the POINTER_PLUS_EXPR only few lines before. I think the code would be a lot nicer and easier to comprehend if you clearly distinguished the various cases (symbol_base != NULL (and sub cases when ptr_base is or is not NULL), ptrbase != NULL and simple constant, even NULL constant, which you do not handle but fail an assert, I think) and handled them separately, including all type conversions. ptr_base is an unfortunate name, IMHO, in many cases it has the role of a variable offset rather than a base. Similarly, ptr_ofset is really a constant_offset. + brig_code_entry_handler::get_tree_type_for_hsa_type: I believe it is better to and the type with BRIG_TYPE_PACK_MASK if you want to determine whether you are looking at a vector (packed) HSAIL instruction. I also think that putting that test into a separate function and calling it from all places when you do this would be more future-proof (something like hsa_type_packed_p in gcc/hsa.c). For the (inner_brig_type == BRIG_TYPE_F16) vector case, you do not end up calling build_type_variant (why is it necessary in any case?) but for other vector types you do. Is that intentional? + brig_code_entry_handler::expand_or_call_builtin: If I am correct that the operands parameter contains only input operands at this point, please state so in the function comment. Since we only allow ourselves 80 characters wide code, it is customary not to put code into else branches if the if branch returns (or breaks/continues) anyway. Given that this function builds vectors only to pile them element-wise to arguments of variable-argument-length function call_builtin, which then builds vectors out if its elements... have you considered having an overloaded implementation of call_builtin that would not do this? It seems particularly wasteful. + brig_code_entry_handler::build_operands: Please make the return type consistent with the one in class definition (tree_stl_vec instead of explicit std::vector<tree>, assuming you prefer the former). GCC coding standard mandates that if a branch has only one statement in it, it should not be encapsulated in braces. Please replace if (operand == NULL_TREE) gcc_unreachable (); with gcc_assert (operand); Please rewrite conditions like !(TREE_CODE (operand) == TREE_VEC) as (TREE_CODE (operand) != TREE_VEC) Again you are creating VIEW_CONVERT_EXPR for an operand that is of a different size than the result type. For scalar types, this is bound to cause trouble sooner or later and I really think you need to avoid it. - brig-basic-inst-handler.cc: + scalarized_sat_arithmetics::scalarized_sat_arithmetics: Do not undef macros preemptively. Instead, undef them right after using them, after the include of a .def file. + brig_basic_inst_handler::must_be_scalarized: I am intrigued by the (elements < 16) part of the condition. This function would also benefit from a comment. + scalarized_sat_arithmetics::builtin - please prefix with m_, ditto for brig_inst_ (why the trailing underscore?) + brig_basic_inst_handler::must_be_scalarized needs a comment explaining what it is for. + brig_basic_inst_handler::get_raw_type: Unless "raw" is some HSA term I have missed, I would strongly suggest that you rename the function to something more immediately obvious, like uint_for_type or something like that. Also, don't use literal 8 but BITS_PER_UNIT. Also, do we really need both this function and brig_code_entry_handler::get_raw_tree_type ? + build_shuffle, build_unpack, build_pack, build_unpack_lo_or_hi and build_lower_element_broadcast: I admit that so far I have only very briefly skimmed through these functions. In any way, use BITS_PER_UNIT instead of 8. + brig_basic_inst_handler::build_instr_expr: Please remove the 'r' and make it build_inst_expr for the sake of consistency. If I understand the code correctly, the operands parameter contains only input operands. In that case, please state so in the function comment and remove the local variable first_input, it has no purpose but to confuse. Also, please move the definition and assignments to local variable input_count (and possibly also output_count) down to where it is used. + brig_basic_inst_handler::operator (): It seems that the opcode local variable is only used to identify the return brig instructions which seems wasteful. Generally, it would be nice to clean this function up a little by moving assignments to some of the very many local variables down, as close to their first use as reasonable. Surprisingly often, you'd remove the need to compute them in many cases at all, e.g. look at element_count and element_size_bits. Extra points for a function comment explaining how work is divided in between operator() itself and its main helpers such as build_instr_expr. + brig_basic_inst_handler::get_tree_code_for_hsa_opcode: The comment says the special value returned when it is necessary to use a chain of tree expressions or an builtin is NULL_TREE, but the function itself returns TREE_LIST or CALL_EXPR. - brig-cmp-inst-handler.cc: + brig_cmp_inst_handler::operator (): the neg_expr seems to be something left from earlier times? Use BITS_PER_UNIT instead of 8, having both result_width and element_width seems unnecessary (and speaking of elements, is that actually even a vector case?), and should be initialized only in the case when it is used. In case of vector results, please build either all_ones or all_zeros, it is wasteful to allocate both. - brig-mem-inst-handler.cc: I believe that using the alignment modifier is something that we should try to get done as soon as possible. - brig-inst-mod-handler.cc: This seems like something that we should at least warn about (in case when effectively an unsupported operation is requested). - brig-seg-inst-handler.cc: At this point I'm trying to read quickly but it seems to me you do not support conversion between flat and global segment... how come? - brig-copy-move-inst-handler.cc: + brig_copy_move_inst_handler::operator (): The function definitely should not cast LDA instructions to BrigInstSourceType*! - brig-atomic-inst-handler.cc: I only skimmed through this, looks good, although I should probably come back to take a closer look later. - brig-branch-inst-handler.cc: I believe that as long as the builtins representing barriers are not pure, they will not be hoisted out of a loop. Nonduplication might indeed be a problem, although short of whole function cloning, I could not think of a transformation gcc performs that might pause a problem. Nevertheless, we probably should introduce an attribute for it and look for it in gimple_can_duplicate_bb_p (and in cfg_layout_can_duplicate_bb_p?). An important issue, but hopefully for later. - brig-variable-handler.cc: + brig_directive_variable_handler::operator (): Please use BITS_PER_UNIT instead of 8. + build_variable: Likewise. I am a bit concerned that unlike in operator(), you do not make the alignment at least as big as natural one, which means that in theory (and probably only on malformed BRIG, I suppose), the two functions might disagree about alignment? I think it would be nice to outline the extraction of alignment to an independent function and use that from both places. - brig-function-handler.cc: + brig_directive_function_handler::operator: Please use gcc_assert instead of assert. (Well, in this case it is clearly input error which, eventually, we will want to give nice errors about. But at least do not use assert.) - brig-function.cc: + brig_function::analyze_calls: The first if condition should be terminated by a newline + brig_function::add_wi_loop: Is the second TODO now obsolete? + brig_function::build_launcher_and_metadata: The ASM directive is really an ugly hack. It is isolated so I am not that much concerned, but building a structure and filling it with data (like we do for example in hsa_output_libgomp_mapping) seems cleaner. - brig-util.cc: + gccbrig_is_raw_operation: I think that calling the operations "bit" operations instead of "raw" would make life of readers of the code slightly but noticeably easier. + gccbrig_hsa_type_bit_size: If possible, please make the default case be gcc_unreachable(). (If zero is expected in some cases, then all callers should check for it, so that we for example do not divide by zero in brig_code_entry_handler::get_tree_type_for_hsa_type.) + might_be_host_defined_var: there is no need for the returned expression to start on a new line. - gcc/builtins.def: In the added DEF_HSAIL_*_BUILTIN macros, please arrange it so that they only pass true in the last argument of DEF_BUILTIN when gcc is configured with BRIG FE. Builtins are not free and should not be added needlessly. Overall, the code has improved significantly. As far as I am concerned, the only real issue I see are the VIEW_CONVERT_EXPRs with mismatched operands. They are asking for trouble, only Ada produces those (although it is acknowledged it should not) and Ada only does it for aggregates. If I understood you correctly, both you and your sponsor have already signed the Copyright assignment, right? If that is so, I'll ask the steering committee to approve the intention and then ask a global reviewer to also peek at it. Meanwhile, I'll look very quickly at the run-time library and see if there is any integration with my work to be done. Thanks for your patience, Martin