Hi! The following patch is infrastructure support for at least 3 different options that need changes: 1) PR104158 talks about a regression with the -fsanitizer-coverage= option; in GCC 11 and older and on trunk prior to r12-1177, this option behaved similarly to -f{,no-}sanitizer{,-recover}= options, namely that the option allows negative and argument of the option is a list of strings, each of them has some enumerator and -fsanitize-coverage= enabled those bits in the underlying flag_sanitize_coverage, while -fno-sanitize-coverage= disabled them. So, -fsanitize-coverage=trace-pc,trace-cmp was equivalent to -fsanitize-coverage=trace-pc -fsanitize-coverage=trace-cmp and both set flag_sanitize_coverage to (SANITIZE_COV_TRACE_PC | SANITIZE_COV_TRACE_CMP) Also, e.g. -fsanitize-coverage=trace-pc,trace-cmp -fno-sanitize-coverage=trace-pc would in the end set flag_sanitize_coverage to SANITIZE_COV_TRACE_CMP (first set both bits, then subtract one) The r12-1177 change, I think done to improve argument misspelling diagnostic, changed the option incompatibly in multiple ways, -fno-sanitize-coverage= is now rejected, only a single argument is allowed, not multiple and -fsanitize-coverage=trace-pc -fsanitize-coverage=trace-cmp enables just SANITIZE_COV_TRACE_CMP and not both (each option overrides the previous value) 2) Thomas Koenig wants to extend Fortran -fconvert= option for the ppc64le real(kind=16) swapping support; currently the option accepts -fconvert={native,swap,big-endian,little-endian} and the intent is to add support for -fconvert=r16_ibm and -fconvert=r16_ieee (that alone is just normal Enum), but also to handle -fconvert=swap,r16_ieee or -fconvert=r16_ieee,big-endian but not -fconvert=big-endian,little-endian - the native/swap/big-endian/little-endian are one mutually exclusive set and r16_ieee/r16_ibm another one. See https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587943.html and thread around that. 3) Similarly Marek Polacek wants to extend the -Wbidi-chars= option, such that it will handle not just the current -Wbidi-chars={none,bidirectional,any}, but also -Wbidi-chars=ucn and bidirectional,ucn and ucn,any etc. Again two separate sets, one none/bidirectional/any and another one ucn. See https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588960.html
The following patch adds framework for this and I'll post incremental patches for 1) and 2). As I've tried to document, such options are marked by additional EnumSet property on the option and in that case all the EnumValues in the Enum referenced from it must use a new Set property with set number (initially I wanted just mark last enumerator in each mutually exclusive set, but optionlist is sorted and so it doesn't really work well). So e.g. for the Fortran -fconvert=, one specifies: fconvert= Fortran RejectNegative Joined Enum(gfc_convert) EnumSet Var(flag_convert) Init(GFC_FLAG_CONVERT_NATIVE) -fconvert=<big-endian|little-endian|native|swap|r16_ieee|r16_ibm> The endianness used for unformatted files. Enum Name(gfc_convert) Type(enum gfc_convert) UnknownError(Unrecognized option to endianness value: %qs) EnumValue Enum(gfc_convert) String(big-endian) Value(GFC_FLAG_CONVERT_BIG) Set(1) EnumValue Enum(gfc_convert) String(little-endian) Value(GFC_FLAG_CONVERT_LITTLE) Set(1) EnumValue Enum(gfc_convert) String(native) Value(GFC_FLAG_CONVERT_NATIVE) Set(1) EnumValue Enum(gfc_convert) String(swap) Value(GFC_FLAG_CONVERT_SWAP) Set(1) EnumValue Enum(gfc_convert) String(r16_ieee) Value(GFC_FLAG_CONVERT_R16_IEEE) Set(2) EnumValue Enum(gfc_convert) String(r16_ibm) Value(GFC_FLAG_CONVERT_R16_IBM) Set(2) and this says to the option handling code that 1) if only one arg is specified to one instance of the option, it can be any of those 6 2) if two args are specified, one has to be from the first 4 and another from the last 2, in any order 3) at most 2 args may be specified (there are just 2 sets) There is a requirement on the Value values checked in self-test, the values from one set ored together must be disjunct from values from another set ored together. In the Fortran case, the first 4 are 0-3 so mask is 3, and the last 2 are 4 and 8, so mask is 12. When say -fconvert=big-endian is specified, it sets the first set to GFC_FLAG_CONVERT_BIG (2) but doesn't modify whatever value the other set had, so e.g. -fconvert=big-endian -fconvert=r16_ieee -fconvert=r16_ieee -fconvert=big-endian -fconvert=r16_ieee,big_endian -fconvert=big_endian,r16_ieee all behave the same. Also, with the EnumSet support, it is now possible to allow not specifying RejectNegative - we can set some set's value and then clear it and set it again to some other value etc. I think with the 2) patch I achieve what we want for Fortran, for 1) the only behavior from gcc 11 is that -fsanitize-coverage=trace-cmp,trace-cmp is now rejected. This is mainly from the desire to disallow -fconvert=big-endian,little-endian or -Wbidi-chars=bidirectional,any etc. where it would be confusing to users what exactly it means. But it is the only from these options that actually acts as an Enum bit set, each enumerator can be specified with all the others. So one option would be stop requiring the EnumSet implies Set properties must be specified and just require that either they are specified on all EnumValues, or on none of them; the latter case would be for -fsanitize-coverage= and the non-Set case would mean that all the EnumValues need to have disjoint Value bitmasks and that they can be all specified and unlike the Set case also repeated. Thoughts on this? Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (or with changes discussed above)? 2022-01-21 Jakub Jelinek <ja...@redhat.com> PR sanitizer/104158 * opt-functions.awk (var_set): Handle EnumSet property. * optc-gen.awk: Don't disallow RejectNegative if EnumSet is specified. * opt-read.awk: Handle Set property. * opts.h (CL_ENUM_SET_SHIFT, CL_ERR_ENUM_SET_ARG): Define. (struct cl_decoded_option): Mention enum in value description. Add mask member. (set_option): Add mask argument defaulted to 0. * opts.cc (test_enum_sets): New function. (opts_cc_tests): Call it. * opts-common.cc (enum_arg_to_value): Change return argument from bool to int, on success return index into the cl_enum_arg array, on failure -1. Add len argument, if non-0, use strncmp instead of strcmp. (opt_enum_arg_to_value): Adjust caller. (decode_cmdline_option): Handle EnumSet represented as CLVC_ENUM with non-zero var_value. Initialize decoded->mask. (decode_cmdline_options_to_array): CLear opt_array[0].mask. (handle_option): Pass decoded->mask to set_options last argument. (generate_option): Clear decoded->mask. (generate_option_input_file): Likewise. (cmdline_handle_error): Handle CL_ERR_ENUM_SET_ARG. (set_option): Add mask argument, use it for CLVC_ENUM. (control_warning_option): Adjust enum_arg_to_value caller. * doc/options.texi: Document Set and EnumSet properties. --- gcc/opt-functions.awk.jj 2022-01-11 23:11:22.797284435 +0100 +++ gcc/opt-functions.awk 2022-01-21 13:21:11.566273837 +0100 @@ -297,7 +297,10 @@ function var_set(flags) } if (flag_set_p("Enum.*", flags)) { en = opt_args("Enum", flags); - return enum_index[en] ", CLVC_ENUM, 0" + if (flag_set_p("EnumSet", flags)) + return enum_index[en] ", CLVC_ENUM, 1" + else + return enum_index[en] ", CLVC_ENUM, 0" } if (var_type(flags) == "const char *") return "0, CLVC_STRING, 0" --- gcc/optc-gen.awk.jj 2022-01-18 11:58:59.741979785 +0100 +++ gcc/optc-gen.awk 2022-01-21 16:54:17.380312546 +0100 @@ -348,6 +348,7 @@ for (i = 0; i < n_opts; i++) { alias_data = "NULL, NULL, N_OPTS" if (flag_set_p("Enum.*", flags[i])) { if (!flag_set_p("RejectNegative", flags[i]) \ + && !flag_set_p("EnumSet", flags[i]) \ && opts[i] ~ "^[Wfgm]") print "#error Enum allowing negative form" } --- gcc/opt-read.awk.jj 2022-01-11 23:11:22.798284420 +0100 +++ gcc/opt-read.awk 2022-01-21 15:38:42.804583033 +0100 @@ -97,10 +97,14 @@ BEGIN { enum_name = opt_args_non_empty("Enum", props) string = opt_args_non_empty("String", props) value = opt_args_non_empty("Value", props) + set = opt_args("Set", props) val_flags = "0" val_flags = val_flags \ test_flag("Canonical", props, "| CL_ENUM_CANONICAL") \ test_flag("DriverOnly", props, "| CL_ENUM_DRIVER_ONLY") + if (set != "") + val_flags = val_flags "| ((" set \ + ") << CL_ENUM_SET_SHIFT)" enum_data[enum_name] = enum_data[enum_name] \ " { " quote string quote ", " value ", " val_flags \ " },\n" --- gcc/opts.h.jj 2022-01-11 23:11:22.801284378 +0100 +++ gcc/opts.h 2022-01-21 17:32:29.007277641 +0100 @@ -170,6 +170,7 @@ extern const unsigned int cl_lang_count; /* Flags for an enumerated option argument. */ #define CL_ENUM_CANONICAL (1 << 0) /* Canonical for this value. */ #define CL_ENUM_DRIVER_ONLY (1 << 1) /* Only accepted in the driver. */ +#define CL_ENUM_SET_SHIFT 2 /* Shift for enum set. */ /* Structure describing an enumerated option argument. */ @@ -226,6 +227,7 @@ extern const unsigned int cl_enums_count #define CL_ERR_NEGATIVE (1 << 6) /* Negative form of option not permitted (together with OPT_SPECIAL_unknown). */ +#define CL_ERR_ENUM_SET_ARG (1 << 7) /* Bad argument of enumerated set. */ /* Structure describing the result of decoding an option. */ @@ -260,9 +262,14 @@ struct cl_decoded_option /* For a boolean option, 1 for the true case and 0 for the "no-" case. For an unsigned integer option, the value of the - argument. 1 in all other cases. */ + argument. For enum the value of the enumerator corresponding + to argument string. 1 in all other cases. */ HOST_WIDE_INT value; + /* For EnumSet the value mask. Variable should be changed to + value | (prev_value & ~mask). */ + HOST_WIDE_INT mask; + /* Any flags describing errors detected in this option. */ int errors; }; @@ -374,7 +381,8 @@ extern bool get_option_state (struct gcc extern void set_option (struct gcc_options *opts, struct gcc_options *opts_set, int opt_index, HOST_WIDE_INT value, const char *arg, - int kind, location_t loc, diagnostic_context *dc); + int kind, location_t loc, diagnostic_context *dc, + HOST_WIDE_INT = 0); extern void *option_flag_var (int opt_index, struct gcc_options *opts); bool handle_generated_option (struct gcc_options *opts, struct gcc_options *opts_set, --- gcc/opts.cc.jj 2022-01-20 11:30:45.595577895 +0100 +++ gcc/opts.cc 2022-01-21 19:44:56.770505499 +0100 @@ -3709,12 +3709,56 @@ test_get_option_html_page () #endif } +/* Verify EnumSet requirements. */ + +static void +test_enum_sets () +{ + for (unsigned i = 0; i < cl_options_count; ++i) + if (cl_options[i].var_type == CLVC_ENUM && cl_options[i].var_value) + { + const struct cl_enum *e = &cl_enums[cl_options[i].var_enum]; + unsigned HOST_WIDE_INT used_sets = 0; + unsigned HOST_WIDE_INT mask = 0; + unsigned highest_set = 0; + for (unsigned j = 0; e->values[j].arg; ++j) + { + unsigned set = e->values[j].flags >> CL_ENUM_SET_SHIFT; + /* Test that enumerators referenced in EnumSet have all + Set(n) on them within the valid range. */ + ASSERT_TRUE (set >= 1 && set <= HOST_BITS_PER_WIDE_INT); + highest_set = MAX (set, highest_set); + used_sets |= HOST_WIDE_INT_1U << (set - 1); + } + /* If there is just one set, no point to using EnumSet. */ + ASSERT_TRUE (highest_set >= 2); + /* Test that there are no gaps in between the sets. */ + if (highest_set == HOST_BITS_PER_WIDE_INT) + ASSERT_TRUE (used_sets == HOST_WIDE_INT_M1U); + else + ASSERT_TRUE (used_sets == (HOST_WIDE_INT_1U << highest_set) - 1); + for (unsigned int j = 1; j <= highest_set; ++j) + { + unsigned HOST_WIDE_INT this_mask = 0; + for (unsigned k = 0; e->values[k].arg; ++k) + { + unsigned set = e->values[j].flags >> CL_ENUM_SET_SHIFT; + if (set == j) + this_mask |= e->values[j].value; + } + ASSERT_TRUE ((mask & this_mask) == 0); + mask |= this_mask; + } + } +} + /* Run all of the selftests within this file. */ void opts_cc_tests () { test_get_option_html_page (); + test_enum_sets (); } } // namespace selftest --- gcc/opts-common.cc.jj 2022-01-18 11:58:59.741979785 +0100 +++ gcc/opts-common.cc 2022-01-21 19:24:25.498717656 +0100 @@ -289,26 +289,29 @@ enum_arg_ok_for_language (const struct c return (lang_mask & CL_DRIVER) || !(enum_arg->flags & CL_ENUM_DRIVER_ONLY); } -/* Look up ARG in ENUM_ARGS for language LANG_MASK, returning true and - storing the value in *VALUE if found, and returning false without +/* Look up ARG in ENUM_ARGS for language LANG_MASK, returning the cl_enum_arg + index and storing the value in *VALUE if found, and returning -1 without modifying *VALUE if not found. */ -static bool +static int enum_arg_to_value (const struct cl_enum_arg *enum_args, - const char *arg, HOST_WIDE_INT *value, + const char *arg, size_t len, HOST_WIDE_INT *value, unsigned int lang_mask) { unsigned int i; for (i = 0; enum_args[i].arg != NULL; i++) - if (strcmp (arg, enum_args[i].arg) == 0 + if ((len + ? (strncmp (arg, enum_args[i].arg, len) == 0 + && enum_args[i].arg[len] == '\0') + : strcmp (arg, enum_args[i].arg) == 0) && enum_arg_ok_for_language (&enum_args[i], lang_mask)) { *value = enum_args[i].value; - return true; + return i; } - return false; + return -1; } /* Look up ARG in the enum used by option OPT_INDEX for language @@ -324,8 +327,8 @@ opt_enum_arg_to_value (size_t opt_index, gcc_assert (option->var_type == CLVC_ENUM); HOST_WIDE_INT wideval; - if (enum_arg_to_value (cl_enums[option->var_enum].values, arg, - &wideval, lang_mask)) + if (enum_arg_to_value (cl_enums[option->var_enum].values, arg, 0, + &wideval, lang_mask) >= 0) { *value = wideval; return true; @@ -534,7 +537,7 @@ decode_cmdline_option (const char *const { size_t opt_index; const char *arg = 0; - HOST_WIDE_INT value = 1; + HOST_WIDE_INT value = 1, mask = 0; unsigned int result = 1, i, extra_args, separate_args = 0; int adjust_len = 0; size_t total_len; @@ -808,8 +811,56 @@ decode_cmdline_option (const char *const { const struct cl_enum *e = &cl_enums[option->var_enum]; - gcc_assert (value == 1); - if (enum_arg_to_value (e->values, arg, &value, lang_mask)) + gcc_assert (option->var_value || value == 1); + if (option->var_value) + { + const char *p = arg; + HOST_WIDE_INT sum_value = 0; + unsigned HOST_WIDE_INT used_sets = 0; + do + { + const char *q = strchr (p, ','); + HOST_WIDE_INT this_value = 0; + if (q && q == p) + { + errors |= CL_ERR_ENUM_SET_ARG; + break; + } + int idx = enum_arg_to_value (e->values, p, q ? q - p : 0, + &this_value, lang_mask); + if (idx < 0) + { + errors |= CL_ERR_ENUM_SET_ARG; + break; + } + + unsigned set = e->values[idx].flags >> CL_ENUM_SET_SHIFT; + gcc_checking_assert (set >= 1 && set <= HOST_BITS_PER_WIDE_INT); + if ((used_sets & (HOST_WIDE_INT_1U << (set - 1))) != 0) + { + errors |= CL_ERR_ENUM_SET_ARG; + break; + } + used_sets |= HOST_WIDE_INT_1U << (set - 1); + + HOST_WIDE_INT this_mask = 0; + for (int i = 0; e->values[i].arg != NULL; i++) + if (set == (e->values[i].flags >> CL_ENUM_SET_SHIFT)) + this_mask |= e->values[i].value; + + sum_value |= this_value; + mask |= this_mask; + if (q == NULL) + break; + p = q + 1; + } + while (1); + if (value == 1) + value = sum_value; + else + gcc_checking_assert (value == 0); + } + else if (enum_arg_to_value (e->values, arg, 0, &value, lang_mask) >= 0) { const char *carg = NULL; @@ -825,6 +876,7 @@ decode_cmdline_option (const char *const decoded->opt_index = opt_index; decoded->arg = arg; decoded->value = value; + decoded->mask = mask; decoded->errors = errors; decoded->warn_message = warn_message; @@ -958,6 +1010,7 @@ decode_cmdline_options_to_array (unsigne opt_array[0].canonical_option[2] = NULL; opt_array[0].canonical_option[3] = NULL; opt_array[0].value = 1; + opt_array[0].mask = 0; opt_array[0].errors = 0; num_decoded_options = 1; @@ -1167,13 +1220,14 @@ handle_option (struct gcc_options *opts, size_t opt_index = decoded->opt_index; const char *arg = decoded->arg; HOST_WIDE_INT value = decoded->value; + HOST_WIDE_INT mask = decoded->mask; const struct cl_option *option = &cl_options[opt_index]; void *flag_var = option_flag_var (opt_index, opts); size_t i; if (flag_var) set_option (opts, (generated_p ? NULL : opts_set), - opt_index, value, arg, kind, loc, dc); + opt_index, value, arg, kind, loc, dc, mask); for (i = 0; i < handlers->num_handlers; i++) if (option->flags & handlers->handlers[i].mask) @@ -1222,6 +1276,7 @@ generate_option (size_t opt_index, const decoded->warn_message = NULL; decoded->arg = arg; decoded->value = value; + decoded->mask = 0; decoded->errors = (option_ok_for_language (option, lang_mask) ? 0 : CL_ERR_WRONG_LANG); @@ -1260,6 +1315,7 @@ generate_option_input_file (const char * decoded->canonical_option[2] = NULL; decoded->canonical_option[3] = NULL; decoded->value = 1; + decoded->mask = 0; decoded->errors = 0; } @@ -1342,6 +1398,74 @@ cmdline_handle_error (location_t loc, co return true; } + if (errors & CL_ERR_ENUM_SET_ARG) + { + const struct cl_enum *e = &cl_enums[option->var_enum]; + const char *p = arg; + unsigned HOST_WIDE_INT used_sets = 0; + const char *second_opt = NULL; + size_t second_opt_len = 0; + errors = 0; + do + { + const char *q = strchr (p, ','); + HOST_WIDE_INT this_value = 0; + if (q && q == p) + { + arg = ""; + errors = CL_ERR_ENUM_ARG; + break; + } + int idx = enum_arg_to_value (e->values, p, q ? q - p : 0, + &this_value, lang_mask); + if (idx < 0) + { + if (q == NULL) + q = strchr (p, '\0'); + char *narg = XALLOCAVEC (char, (q - p) + 1); + memcpy (narg, p, q - p); + narg[q - p] = '\0'; + arg = narg; + errors = CL_ERR_ENUM_ARG; + break; + } + + unsigned set = e->values[idx].flags >> CL_ENUM_SET_SHIFT; + gcc_checking_assert (set >= 1 && set <= HOST_BITS_PER_WIDE_INT); + if ((used_sets & (HOST_WIDE_INT_1U << (set - 1))) != 0) + { + if (q == NULL) + q = strchr (p, '\0'); + if (second_opt == NULL) + { + used_sets = HOST_WIDE_INT_1U << (set - 1); + second_opt = p; + second_opt_len = q - p; + p = arg; + continue; + } + char *args = XALLOCAVEC (char, (q - p) + 1 + second_opt_len + 1); + memcpy (args, p, q - p); + args[q - p] = '\0'; + memcpy (args + (q - p) + 1, second_opt, second_opt_len); + args[(q - p) + 1 + second_opt_len] = '\0'; + error_at (loc, "invalid argument in option %qs", opt); + if (strcmp (args, args + (q - p) + 1) == 0) + inform (loc, "%qs specified multiple times in the same option", + args); + else + inform (loc, "%qs is mutually exclusive with %qs and cannot be" + " specified together", args, args + (q - p) + 1); + return true; + } + used_sets |= HOST_WIDE_INT_1U << (set - 1); + if (q == NULL) + break; + p = q + 1; + } + while (1); + } + if (errors & CL_ERR_ENUM_ARG) { const struct cl_enum *e = &cl_enums[option->var_enum]; @@ -1441,7 +1565,8 @@ read_cmdline_option (struct gcc_options void set_option (struct gcc_options *opts, struct gcc_options *opts_set, int opt_index, HOST_WIDE_INT value, const char *arg, int kind, - location_t loc, diagnostic_context *dc) + location_t loc, diagnostic_context *dc, + HOST_WIDE_INT mask /* = 0 */) { const struct cl_option *option = &cl_options[opt_index]; void *flag_var = option_flag_var (opt_index, opts); @@ -1550,7 +1675,10 @@ set_option (struct gcc_options *opts, st { const struct cl_enum *e = &cl_enums[option->var_enum]; - e->set (flag_var, value); + if (mask) + e->set (flag_var, value | (e->get (flag_var) & ~mask)); + else + e->set (flag_var, value); if (set_flag_var) e->set (set_flag_var, 1); } @@ -1767,7 +1895,8 @@ control_warning_option (unsigned int opt { const struct cl_enum *e = &cl_enums[option->var_enum]; - if (enum_arg_to_value (e->values, arg, &value, lang_mask)) + if (enum_arg_to_value (e->values, arg, 0, &value, + lang_mask) >= 0) { const char *carg = NULL; --- gcc/doc/options.texi.jj 2022-01-18 11:58:59.472983628 +0100 +++ gcc/doc/options.texi 2022-01-21 19:08:15.162270676 +0100 @@ -132,6 +132,21 @@ be accepted by the driver. This is used @option{-march=native} that are processed by the driver so that @samp{gcc -v} shows how the options chosen depended on the system on which the compiler was run. + +@item Set(@var{number}) +This property is optional, required for enumerations used in +@code{EnumSet} options. @var{number} should be decimal number between +1 and 64 inclusive and divides the enumeration into a set of +sets of mutually exclusive arguments. Arguments with the same +@var{number} can't be specified together in the same option, but +arguments with different @var{number} can. @var{value} needs to be +chosen such that a mask of all @var{value} values from the same set +@var{number} bitwise ored doesn't overlap with masks for other sets. +When @code{-foption=arg_from_set1,arg_from_set4} and +@code{-fno-option=arg_from_set3} are used, the effect is that previous +value of the @code{Var} will get bits from set 1 and 4 masks cleared, +ored @code{Value} of @code{arg_from_set1} and @code{arg_from_set4} +and then will get bits from set 3 mask cleared. @end table @item @@ -396,6 +411,16 @@ with the corresponding @samp{Enum} recor converted to the integer specified in the corresponding @samp{EnumValue} record before being passed to option handlers. +@item EnumSet +Must be used together with the @code{Enum(@var{name})} property. +Corresponding @samp{Enum} record must use @code{Set} properties. +The option's argument is either a string from the set like for +@code{Enum(@var{name})}, but with a slightly different behavior that +the whole @code{Var} isn't overwritten, but only the bits in all the +enumeration values with the same set bitwise ored together. +Or option's argument can be a comma separated list of strings where +each string is from a different @code{Set(@var{number})}. + @item Defer The option should be stored in a vector, specified with @code{Var}, for later processing. Jakub