https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80437
David Malcolm <dmalcolm at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmalcolm at gcc dot gnu.org --- Comment #1 from David Malcolm <dmalcolm at gcc dot gnu.org> --- (In reply to Martin Sebor from comment #0) [...snip...] > bug.c:11:5: warning: 'memset': specified size 0xfffffffffffffffb exceeds > maximum object size 0xffffffffffffffff [-Wstringop-overflow=] > > I'm not sure that this a significant improvement. Those already familiar > with the -Wstringop-overflow warning will likely understand what > 0xffffffffffffffff in this context means but only because we know the > maximum object size limit (i.e., PTRDIFF_MAX) and realize that all printed > values are in the [PTRDIFF_MAX + 1, SIZE_MAX] range and thus always consist > of 16 hex digits. Someone who's seen the warning for the first time will > either have to guess or count the f's. This is even more likely for the > specified size (such as 0xfffffffffffffffb). In cases where a much lower > limit is specified by the user (e.g., via -Walloca-larger-than) it's even > less clear how to interpret a number in any base. > > I think it's possible to do better. One approach is to print very large > values in terms of well-known constants such as SIZE_MAX or PTRDIFF_MAX. > For instance, instead of printing 18446744073709551611 (i.e., -5) print > SIZE_MAX - 4. Another solution might be to print sizes as signed (though > that won't help in the case of the user-specified limit). How about printing *both* i.e.: bug.c:11:5: warning: 'memset': specified size 0xfffffffffffffffb (SIZE_MAX - 4) exceeds maximum object size 0xffffffffffffffff (PTRDIFF_MAX) [-Wstringop-overflow=] (I may have got the expressions wrong, but hopefully the meaning is clear) > Since the problem of how best to present large decimal numbers is general > and applies to all diagnostics, including warnings, errors, and notes, a > change to how these numbers are presented should be brought up for a wider > discussion before it's implemented consistently, for all diagnostics. I find large decimal numbers intimidating, and find hexadecimals easier for values close to large powers of two. Suggestion: choose base based on a "mental effort cost": Example 1 ********* For example, if we have an overflow that occurs when x >= 2^31, which is easier to read: DECIMAL: warning: buffer overflow occurs when x >= 2147483648 HEX: warning: buffer overflow occurs when x >= 0x80000000 FORMULA: warning: buffer overflow occurs when x >= 2^31 FORMULA and HEX: warning: buffer overflow occurs when x >= 2^31 (0x80000000) Example 2 ********* an overflow that occurs when x >= 100 DECIMAL: warning: buffer overflow occurs when x >= 100 HEX: warning: buffer overflow occurs when x >= 0x64 In the above case, decimal is the easier-to-read format. Example 3 ********* an overflow that occurs when x >= 0x7fff0000 DECIMAL: warning: buffer overflow occurs when x >= 2147418112 HEX: warning: buffer overflow occurs when x >= 0x7fff0000 In this case, hexadecimal is the easier-to-read format. Example 4 ********* an overflow that occurs when x <= -8000 DECIMAL: warning: buffer overflow occurs when x <= -8000 HEX: warning: buffer overflow occurs when x <= -0x1f40 The idea ******** The idea is a way to choose the printed representation based on the value, based on the number of "awkward" digits. On implementation is to assign a cost to a digit based on closeness to zero. For example, in decimal, '0' : low cost '1', '9': medium cost '2'..'8': high cost in hexadecimal:i '0' : low cost '1', 'f': medium cost '2'..'e': high cost We can weight these, say cost 10 for "high", cost 1 for "medium", cost 0 for "low". "Cheaper" in this sense should mean "easier for a human to understand"; a rough measure of the amount of mental effort required by a human reader. Hence: example 1: decimal: 2147483648 10 digits, 9 high cost, 1 medium cost: cost = 91 hexadecimal: 0x80000000 8 digits; 1 high cost, 7 low cost: cost = 17 hence hexadecimal is "cheaper", and we use it example 2: decimal: 100 3 digits, 1 medium cost, 2 low cost: cost = 1 hexadecimal: 0x64 2 high cost digits: cost = 20 hence decimal is "cheaper", and we use it example 3: decimal: 2147418112 10 digits: 4 medium cost, 6 high cost: cost = 64 hexadecimal: 0x7fff0000 8 digits: 1 high cost, 3 medium cost, 4 low cost: cost = 13 hence hexadecimal is "cheaper", and we use it example 4: decimal: -8000 3 low cost digits, 1 high cost: cost = 10 hexadecimal: -0x1f40 1 low cost, 2 medium cost, 1 high cost: cost = 12 hence decimal is "cheaper", and we use it I guessed at these weightings; there may be better ones.