The function rebuild_location_adhoc_htab() was meant to reconstruct the
adhoc location hash map after restoring a line_maps instance from a
PCH. However, the function has never performed as intended because it
missed the last step of adding the data into the newly reconstructed hash
map. This patch fixes that.

It does not seem possible to construct a test case such that the current
incorrect behavior is observable as a compiler issue. It would be
observable, if it were possible for a precompiled header to contain an
adhoc location with a non-zero custom data pointer. But currently, such
data pointers are used only by the middle end to track inlining
information, and this happens later, too late to show up in a PCH.

I also noted that location_adhoc_data_update, which updates the hash map
pointers in a different scenario, was relying on undefined pointer
arithmetic behavior. I'm not aware of this having caused any issue in
practice, but in this patch I have also changed it to use defined pointer
operations instead.

libcpp/ChangeLog:

        * line-map.cc (location_adhoc_data_update): Remove reliance on
        undefined behavior.
        (get_combined_adhoc_loc): Likewise.
        (rebuild_location_adhoc_htab): Fix issue where the htab was not
        properly updated.
---

Notes:
    Hello-
    
    While working on something unrelated in line-map.cc, I noticed that the
    function rebuild_location_adhoc_htab(), whose job is to reconstruct the
    adhoc data hash table after a line_maps instance is reconstructed from PCH,
    doesn't actually rebuild the hash table at all:
    
    void
    rebuild_location_adhoc_htab (line_maps *set)
    {
      unsigned i;
      set->location_adhoc_data_map.htab =
          htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, 
NULL);
      for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++)
        htab_find_slot (set->location_adhoc_data_map.htab,
                        set->location_adhoc_data_map.data + i, INSERT);
     ^^^^^^^^^^^^^^
    }
    
    In order to have the intended effect, it needs to set the return value of
    htab_find_slot to be set->location_adhoc_data_map.data + i, otherwise it
    doesn't effectively do anything except make the hash table think it has
    curr_loc elements set, when in fact it has 0. Subsequent calls to
    htab_traverse, for instance, will do nothing, and any lookups will also 
fail.
    
    I tried for some time to construct a test case that would demonstrate an
    observable consequence of this issue, but I don't think it's possible... The
    nontrivial uses of this hash map are in the middle end (e.g. to produce the
    trace of where a given expression was inlined from), and all this happens
    after PCH was read in, and doesn't require any state to be read from the
    PCH. It would become apparent in the future, however, if the ability to
    attach arbitrary data to an adhoc location were used in other ways, perhaps
    somewhere in libcpp; in that hypothetical case, the data would be lost when
    reading back in the PCH.
    
    There is another kinda related function, location_adhoc_data_update, which
    updates all the pointers in the hash map whenever they are invalidated. It
    seems to me, that this function invokes undefined behavior, since it adds an
    arbitrary offset to the pointers, which do not necessarily point into the
    same array after they were realloced. I don't think it's led to any problems
    in practice but in this patch I also changed that to use well-defined
    operations. Note sure how people may feel about that, since it does require
    on the surface 2x as many operations with this change, but I can't see how
    the current approach is guaranteed to be valid on all architectures?
    
    Bootstrap + regtest looks good on x86-64 Linux. Thanks a lot to whoever may
    have time to take a look at it.
    
    -Lewis

 libcpp/line-map.cc | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index 62077c3857c..391f1d4bbc1 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -85,27 +85,38 @@ location_adhoc_data_eq (const void *l1, const void *l2)
          && lb1->data == lb2->data);
 }
 
-/* Update the hashtable when location_adhoc_data is reallocated.  */
+/* Update the hashtable when location_adhoc_data_map::data is reallocated.
+   The param is an array of two pointers, the previous value of the data
+   pointer, and then the new value.  The pointers stored in the hash map
+   are then rebased to be relative to the new data pointer instead of the
+   old one.  */
 
 static int
-location_adhoc_data_update (void **slot, void *data)
+location_adhoc_data_update (void **slot_v, void *param_v)
 {
-  *((char **) slot)
-    = (char *) ((uintptr_t) *((char **) slot) + *((ptrdiff_t *) data));
+  const auto slot = reinterpret_cast<location_adhoc_data **> (slot_v);
+  const auto param = static_cast<location_adhoc_data **> (param_v);
+  *slot = (*slot - param[0]) + param[1];
   return 1;
 }
 
-/* Rebuild the hash table from the location adhoc data.  */
+/* The adhoc data hash table is not part of the GGC infrastructure, so it was
+   not initialized when SET was reconstructed from PCH; take care of that by
+   rebuilding it from scratch.  */
 
 void
 rebuild_location_adhoc_htab (line_maps *set)
 {
-  unsigned i;
   set->location_adhoc_data_map.htab =
       htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, 
NULL);
-  for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++)
-    htab_find_slot (set->location_adhoc_data_map.htab,
-                   set->location_adhoc_data_map.data + i, INSERT);
+  for (auto p = set->location_adhoc_data_map.data,
+           end = p + set->location_adhoc_data_map.curr_loc;
+      p != end; ++p)
+    {
+      const auto slot = reinterpret_cast<location_adhoc_data **>
+       (htab_find_slot (set->location_adhoc_data_map.htab, p, INSERT));
+      *slot = p;
+    }
 }
 
 /* Helper function for get_combined_adhoc_loc.
@@ -211,8 +222,7 @@ get_combined_adhoc_loc (line_maps *set,
       if (set->location_adhoc_data_map.curr_loc >=
          set->location_adhoc_data_map.allocated)
        {
-         char *orig_data = (char *) set->location_adhoc_data_map.data;
-         ptrdiff_t offset;
+         const auto orig_data = set->location_adhoc_data_map.data;
          /* Cast away extern "C" from the type of xrealloc.  */
          line_map_realloc reallocator = (set->reallocator
                                          ? set->reallocator
@@ -226,10 +236,13 @@ get_combined_adhoc_loc (line_maps *set,
              reallocator (set->location_adhoc_data_map.data,
                           set->location_adhoc_data_map.allocated
                           * sizeof (struct location_adhoc_data));
-         offset = (char *) (set->location_adhoc_data_map.data) - orig_data;
          if (set->location_adhoc_data_map.allocated > 128)
-           htab_traverse (set->location_adhoc_data_map.htab,
-                          location_adhoc_data_update, &offset);
+           {
+             location_adhoc_data *param[2]
+               = {orig_data, set->location_adhoc_data_map.data};
+             htab_traverse (set->location_adhoc_data_map.htab,
+                            location_adhoc_data_update, param);
+           }
        }
       *slot = set->location_adhoc_data_map.data
              + set->location_adhoc_data_map.curr_loc;

Reply via email to