On IPA PTA field sensitivity and pointer expression (part 2)
Hi, previously I sent an e-mail inquiring about the state of points-to information of structure variables allocated in the heap. It was brought to my attention that heap variables do not have a size to model and therefore IPA-PTA is not able to provide field sensitivity. I now understand better how field sensitivity is modeled in IPA-PTA and the way size is needed in order to compute the correct solution. However, I am now trying to compute the points-to analysis for pointer expressions for stack allocated struct variables. I am trying to answer the question: What does `temp->f1` points to? For the following simple example without heap allocated memory. ```c struct A { char* f0; char *f1; struct A *f2;}; int __GIMPLE(startwith("ipa-pta")) main (int argc, char * * argv) { struct A p1; char * pc; char c; char *cast; struct A*temp; char *temp2; int i; int _27; i_15 = 1; pc = &c; p1.f1 = pc; p1.f2 = &p1; _27 = 0; cast = pc; temp = p1.f2; temp2 = temp->f1; return _27; } ``` There are two question I have regarding this example. The first one is that IPA-PTA will determine that temp2 points to { c p1 } while I think it should only point to { c } and I'm trying to understand why. The second thing is that, I am still unsure how to get points-to information for pointer expressions like temp->f1. Details: IPA-PTA correctly points out that the structure p1 and structure pointer temp can point to both { c and p1 } ``` c = { } p1 = { c p1 } same as temp_33 temp_33 = { c p1 } ``` I believe this is because p1 is a the whole struct variable, and temp_33 is also modeling the whole struct variable. (in other words *temp_33+64 points-to c, *temp_33+128 points-to p1. Note that nothing is in field f0) However, in the case of temp2, we have the following points-to information: ``` temp2_34 = { c p1 } ``` which I believe is an over approximation. Looking at the constraints generated, we see that temp2_34 was assigned the following constraint temp2_34 = *temp_33 + 64 And that means that the method do_sd_constraint should have been used to compute the correct points to information. Looking at the the method, and adding some print statements, it is clear to me that the problem with this imprecision is that temp_33 may point to { c } in its second field. However, isn't GCC supposed to take into account field information in this case? I believe that in order to make this more precise we need a change in the get_varinfo API to something that takes into account offsets and gets the solution for pointer expressions. Instead of this line else if (v->may_have_pointers && add_graph_edge (graph, lhs, t)) flag |= bitmap_ior_into (sol, get_varinfo (t)->solution); something like: else if (v->may_have_pointers && add_graph_edge (graph, lhs, t)) flag |= bitmap_ior_into (sol, get_varinfo (t, roffset)->solution); This seems to me that it is already a known issue and it might be described accurately by this comment. TODO: Adding offsets to pointer-to-structures can be handled (IE not punted on and turned into anything), but isn't. You can just see what offset inside the pointed-to struct it's going to access. So, I just want to confirm, does this comment refer concretely to what I'm trying to do? And does this mean that in order to accomplish an API similar to what I described, would I need to create new constraint variables? (One new constraint variable for each field in all pointer to struct variables) Thanks!
Re: On IPA PTA field sensitivity and pointer expression (part 2)
On 28/09/2020 14:25, Erick Ochoa wrote: Hi, previously I sent an e-mail inquiring about the state of points-to information of structure variables allocated in the heap. It was brought to my attention that heap variables do not have a size to model and therefore IPA-PTA is not able to provide field sensitivity. I now understand better how field sensitivity is modeled in IPA-PTA and the way size is needed in order to compute the correct solution. However, I am now trying to compute the points-to analysis for pointer expressions for stack allocated struct variables. I am trying to answer the question: What does `temp->f1` points to? For the following simple example without heap allocated memory. ```c struct A { char* f0; char *f1; struct A *f2;}; int __GIMPLE(startwith("ipa-pta")) main (int argc, char * * argv) { struct A p1; char * pc; char c; char *cast; struct A*temp; char *temp2; int i; int _27; i_15 = 1; pc = &c; p1.f1 = pc; p1.f2 = &p1; _27 = 0; cast = pc; temp = p1.f2; temp2 = temp->f1; return _27; } ``` There are two question I have regarding this example. The first one is that IPA-PTA will determine that temp2 points to { c p1 } while I think it should only point to { c } and I'm trying to understand why. The second thing is that, I am still unsure how to get points-to information for pointer expressions like temp->f1. Details: IPA-PTA correctly points out that the structure p1 and structure pointer temp can point to both { c and p1 } ``` c = { } p1 = { c p1 } same as temp_33 temp_33 = { c p1 } ``` I believe this is because p1 is a the whole struct variable, and temp_33 is also modeling the whole struct variable. (in other words *temp_33+64 points-to c, *temp_33+128 points-to p1. Note that nothing is in field f0) However, in the case of temp2, we have the following points-to information: ``` temp2_34 = { c p1 } ``` which I believe is an over approximation. Looking at the constraints generated, we see that temp2_34 was assigned the following constraint temp2_34 = *temp_33 + 64 And that means that the method do_sd_constraint should have been used to compute the correct points to information. Looking at the the method, and adding some print statements, it is clear to me that the problem with this imprecision is that temp_33 may point to { c } in its second field. Small correction: temp_33 may point to p1 in its third field. However, isn't GCC supposed to take into account field information in this case? I believe that in order to make this more precise we need a change in the get_varinfo API to something that takes into account offsets and gets the solution for pointer expressions. Instead of this line else if (v->may_have_pointers && add_graph_edge (graph, lhs, t)) flag |= bitmap_ior_into (sol, get_varinfo (t)->solution); something like: else if (v->may_have_pointers && add_graph_edge (graph, lhs, t)) flag |= bitmap_ior_into (sol, get_varinfo (t, roffset)->solution); This seems to me that it is already a known issue and it might be described accurately by this comment. TODO: Adding offsets to pointer-to-structures can be handled (IE not punted on and turned into anything), but isn't. You can just see what offset inside the pointed-to struct it's going to access. So, I just want to confirm, does this comment refer concretely to what I'm trying to do? And does this mean that in order to accomplish an API similar to what I described, would I need to create new constraint variables? (One new constraint variable for each field in all pointer to struct variables) Thanks!
Re: First set of patches to allow changing the long double default were posted:
On Thu, 24 Sep 2020, Michael Meissner wrote: > As per the discussion in this thread, I did decide to keep things to two types > within the compiler. This means that an explicit __float128 or _Float128 will > use the same type node and TFmode as long double uses if the default for long > double is IEEE 128-bit. I'm not sure which patch in the series is supposed to be implementing that, but C requires that long double and _Float128 are distinct types (so you can use _Generic to choose between them, for example) even if they have the same ABI. -- Joseph S. Myers jos...@codesourcery.com
[PATCH v4] : Add nitems()
'nitems()' calculates the length of an array in number of items. It is safe: if a pointer is passed to the macro (or function, in C++), the compilation is broken due to: - In >= C11: _Static_assert() - In C89, C99: Negative anonymous bitfield - In C++: The template requires an array Some BSDs already provide a macro nitems() in , although it usually doesn't provide safety against pointers. This patch uses the same name for compatibility reasons, and to be the least disruptive with existing code. This patch also adds some other macros, which are required by 'nitems()': __is_same_type(a, b): Returns non-zero if the two input arguments are of the same type. __is_array(arr): Returns non-zero if the input argument is of an array type. __must_be(expr, msg): Allows using _Static_assert() everywhere an expression can be used. It evaluates '(int)0' or breaks the compilation. __must_be_array(arr): It evaluates to '(int)0' if the argument is of an array type. Else, it breaks compilation. __nitems(arr): It implements the basic sizeof division needed to calculate the array length. P.S.: I'd like to put this patch in the public domain. Signed-off-by: Alejandro Colomar --- A few changes since v3: - Macros don't need reserved names in their parameters, so I simplified those names. - I fixed some wrong indentation levels. - Renamed __array_len() to __nitems() for consistency. misc/sys/param.h | 47 +++ 1 file changed, 47 insertions(+) diff --git a/misc/sys/param.h b/misc/sys/param.h index d7c319b157..08d4093961 100644 --- a/misc/sys/param.h +++ b/misc/sys/param.h @@ -102,5 +102,52 @@ #define MIN(a,b) (((a)<(b))?(a):(b)) #define MAX(a,b) (((a)>(b))?(a):(b)) +/* Macros related to the types of variables */ +#define __is_same_type(a, b) \ + __builtin_types_compatible_p(__typeof__(a), __typeof__(b)) +#define __is_array(arr)(!__is_same_type((arr), &(arr)[0])) + +/* Macros for embedding _Static_assert() in expressions */ +#if __STDC_VERSION__ >= 201112L +# define __must_be(expr, msg) ( \ +0 * (int)sizeof( \ + struct {\ +_Static_assert((expr), msg); \ +char _ISO_C_forbids_a_struct_with_no_members; \ + } \ +) \ +) +#else +# define __must_be(expr, msg) ( \ +0 * (int)sizeof( \ + struct {\ +int : (-!(expr));\ +char _ISO_C_forbids_a_struct_with_no_members; \ + } \ +) \ +) +#endif +#define __must_be_array(arr) __must_be(__is_array(arr), "Must be an array!") + +/* Macros for array sizes */ +#if defined(__cplusplus) +# if __cplusplus >= 201103L +template + constexpr inline std::size_t + nitems(const _Tp(&)[_Len]) __THROW + { +return _Len; + } +# else /* __cplusplus < 201103L */ +template + char + (&__nitems_chararr(const _Tp(&)[_Len]))[_Len]; +# define nitems(arr) (sizeof(__nitems_chararr(arr))) +# endif /* __cplusplus < 201103L */ +#else /* !defined(__cplusplus) */ +# define __nitems(arr) (sizeof((arr)) / sizeof((arr)[0])) +# define nitems(arr) (__nitems(arr) + __must_be_array(arr)) +#endif /* !defined(__cplusplus) */ + #endif /* sys/param.h */ -- 2.28.0
Re: First set of patches to allow changing the long double default were posted:
On Mon, Sep 28, 2020 at 04:38:51PM +, Joseph Myers wrote: > On Thu, 24 Sep 2020, Michael Meissner wrote: > > > As per the discussion in this thread, I did decide to keep things to two > > types > > within the compiler. This means that an explicit __float128 or _Float128 > > will > > use the same type node and TFmode as long double uses if the default for > > long > > double is IEEE 128-bit. > > I'm not sure which patch in the series is supposed to be implementing > that, but C requires that long double and _Float128 are distinct types (so > you can use _Generic to choose between them, for example) even if they > have the same ABI. Hmmm, but others said that it would complicate things if __float128 were different than long double. I've implemented it both ways (I would have to dig up the patches to have separate types). And doing so might break all of the glibc efforts to provide dual symbols. I could obviously use __float128 to be the same as long double, and _Float128 being a unique type if long double is IEEE. And since C++ doesn't have _Float128, it would 'work' without introducing a new mangling name. I don't think there is a solution that doesn't break something. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: First set of patches to allow changing the long double default were posted:
On Mon, Sep 28, 2020 at 05:07:55PM -0400, Michael Meissner wrote: > On Mon, Sep 28, 2020 at 04:38:51PM +, Joseph Myers wrote: > > On Thu, 24 Sep 2020, Michael Meissner wrote: > > > > > As per the discussion in this thread, I did decide to keep things to two > > > types > > > within the compiler. This means that an explicit __float128 or _Float128 > > > will > > > use the same type node and TFmode as long double uses if the default for > > > long > > > double is IEEE 128-bit. > > > > I'm not sure which patch in the series is supposed to be implementing > > that, but C requires that long double and _Float128 are distinct types (so > > you can use _Generic to choose between them, for example) even if they > > have the same ABI. > > Hmmm, but others said that it would complicate things if __float128 were > different than long double. I've implemented it both ways (I would have to > dig > up the patches to have separate types). > > And doing so might break all of the glibc efforts to provide dual symbols. > > I could obviously use __float128 to be the same as long double, and _Float128 > being a unique type if long double is IEEE. And since C++ doesn't have > _Float128, it would 'work' without introducing a new mangling name. That seems preferrable to me. Jakub
Re: {standard input}:1174: Error: inappropriate arguments for opcode 'mpydu'
On Sun, 27 Sep 2020, Rong Chen wrote: > Hi Nicolas, > > Thanks for the feedback, the error still remains with gcc 10.2.0: I've created the simplest test case that can be. You won't believe it. Test case: $ cat test.c unsigned int test(unsigned int x, unsigned long long y) { y /= 0x2000; if (x > 1) y *= x; return y; } $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/0day/gcc-9.3.0-nolibc/arc-elf/libexec/gcc/arc-elf/9.3.0 $ ~/0day/gcc-9.3.0-nolibc/arc-elf/bin/arc-elf-gcc -mcpu=hs38 -mbig-endian -O2 -c test.c /tmp/cc0GAomh.s: Assembler messages: /tmp/cc0GAomh.s:21: Error: inappropriate arguments for opcode 'mpydu' I know nothing about ARC. Please anyone take it over from here. Nicolas
Re: First set of patches to allow changing the long double default were posted:
On Mon, 28 Sep 2020, Michael Meissner via Gcc wrote: > > I'm not sure which patch in the series is supposed to be implementing > > that, but C requires that long double and _Float128 are distinct types (so > > you can use _Generic to choose between them, for example) even if they > > have the same ABI. > > Hmmm, but others said that it would complicate things if __float128 were > different than long double. I've implemented it both ways (I would have to > dig > up the patches to have separate types). > > And doing so might break all of the glibc efforts to provide dual symbols. > > I could obviously use __float128 to be the same as long double, and _Float128 > being a unique type if long double is IEEE. And since C++ doesn't have > _Float128, it would 'work' without introducing a new mangling name. My comment is specifically about _Float128, in C; it doesn't say anything about what should happen for C++ (though it would be nice to resolve the ICEs that occur when _FloatN / _FloatNx types leak into C++ code via built-in functions, bug 85518). Maybe it makes sense for the type (or at least the type used by the built-in functions and __float128, since _Float128 itself can't be accessed directly in C++) to be the same in the C++ case. The existing code in build_common_tree_nodes that creates all the _FloatN / _FloatNx type nodes always creates a new node with make_node. And since I didn't spot anything in your patch series that changes that, I couldn't see how that patch series actually does make them the same type. -- Joseph S. Myers jos...@codesourcery.com