On Tue, Feb 15, 2005 at 06:02:39PM +0000, Andrew Haley wrote:
> Richard Henderson writes:
>  > On Tue, Feb 15, 2005 at 05:27:15PM +0000, Andrew Haley wrote:
>  > > So, now for my question: why do we not call __register_frame_info() or
>  > > __register_frame_info_bases() ?
>  > 
>  > Because in the normal case for C/C++, folks don't use that many 
>  > exceptions.  So delaying doing anything until it's needed is a win.
>  > 
>  > Obviously the normal case is different for Java.
> 
> Yeah.  In the big server application I'm working on, almost 40% of
> total CPU time is spent inside one function,
> _Unwind_IteratePhdrCallback().  This is for a few reasons: firstly,
> java uses stack traces for things other than throwing exceptions.
> Exceptions are relatively rare.
> 
> Also, in a big server application you have a lot of shared libraries:
> this one has 86.  So, for every single stack frame we're doing many
> excursions through _Unwind_IteratePhdrCallback().
>
>  > > We'd avoid a great many trips through
>  > > dl_iterate_phdr () and _Unwind_IteratePhdrCallback().
>  > 
>  > While I still like using dl_iterate_phdr instead of
>  > __register_frame_info_bases for totally aesthetic reasons, there
>  > have been changes made to the dl_iterate_phdr interface since the
>  > gcc support was written that would allow the dl_iterate_phdr
>  > results to be cached.
> 
> That would be nice.  Also, we could fairly easily build a tree of
> nodes, one for each loaded object, then we wouldn't be doing a linear
> search through them.  We could do that lazily, so it wouldn't kick in
> 'til needed.

Here is a rough patch for what you can do.
The actual cache is not implemented, only comments say what would be
done in that case.
Now, it depends if we want to use malloc for creation of the cache or not
(because _Unwind_* is also used for backtrace etc. and at that point
the program can be in rather unstable state).
So, if e.g. a 8 entries fixed cache for PC
range -> (load_base, p_eh_frame_hdr, p_dynamic) is enough, it could be
searched linearly.  If you need a dynamically created binary tree,
that's another alternative, but will need malloc.  Or you can combine
a few static entries with a dynamically allocated rest of the tree.
You still need to call dl_iterate_phdr, so that ld.so grabs the
mutex and tells you if there have been any dlcloses or dlopens since
last time you called it, but you don't have to scan all the libs
and can just search the one that is known to contain the pc.

--- unwind-dw2-fde-glibc.c.jj   2004-10-28 15:25:20.000000000 +0200
+++ unwind-dw2-fde-glibc.c      2005-02-15 19:10:48.183511095 +0100
@@ -74,6 +74,7 @@ struct unw_eh_callback_data
   void *dbase;
   void *func;
   const fde *ret;
+  int check_cache;
 };
 
 struct unw_eh_frame_hdr
@@ -123,11 +124,15 @@ _Unwind_IteratePhdrCallback (struct dl_p
   const struct unw_eh_frame_hdr *hdr;
   _Unwind_Ptr eh_frame;
   struct object ob;
-
-  /* Make sure struct dl_phdr_info is at least as big as we need.  */
-  if (size < offsetof (struct dl_phdr_info, dlpi_phnum)
-            + sizeof (info->dlpi_phnum))
-    return -1;
+  struct ext_dl_phdr_info
+    {
+      ElfW(Addr) dlpi_addr;
+      const char *dlpi_name;
+      const ElfW(Phdr) *dlpi_phdr;
+      ElfW(Half) dlpi_phnum;
+      unsigned long long int dlpi_adds;
+      unsigned long long int dlpi_subs;
+    };
 
   match = 0;
   phdr = info->dlpi_phdr;
@@ -135,6 +140,31 @@ _Unwind_IteratePhdrCallback (struct dl_p
   p_eh_frame_hdr = NULL;
   p_dynamic = NULL;
 
+  if (data->check_cache && size >= sizeof (struct ext_dl_phdr_info))
+    {
+      static unsigned long long adds = -1ULL, subs;
+      struct ext_dl_phdr_info *einfo = (struct ext_dl_phdr_info *) info;
+      if (einfo->dlpi_adds == adds && einfo->dlpi_subs == subs)
+        {
+          /* Find data->pc in shared library cache.
+             Set load_base, p_eh_frame_hdr and p_dynamic
+             plus match from the cache and goto
+             "Read .eh_frame_hdr header." below.  */
+        }
+      else
+        {
+          adds = einfo->dlpi_adds;
+          subs = einfo->dlpi_subs;
+          /* Invalidate cache.  */
+        }
+      data->check_cache = 0;
+    }
+
+  /* Make sure struct dl_phdr_info is at least as big as we need.  */
+  if (size < offsetof (struct dl_phdr_info, dlpi_phnum)
+            + sizeof (info->dlpi_phnum))
+    return -1;
+
   /* See if PC falls into one of the loaded segments.  Find the eh_frame
      segment at the same time.  */
   for (n = info->dlpi_phnum; --n >= 0; phdr++)
@@ -289,6 +319,7 @@ _Unwind_Find_FDE (void *pc, struct dwarf
   data.dbase = NULL;
   data.func = NULL;
   data.ret = NULL;
+  data.check_cache = 1;
 
   if (dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data) < 0)
     return NULL;


        Jakub

Reply via email to