On 29/05/2013 9:41 AM, Ian Lance Taylor wrote:
On Tue, May 28, 2013 at 9:02 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:
Maybe I misunderstood... there's currently a (very small) cache
(unwind-dw2-fde-dip.c) that lives behind the loader mutex. It contains 8
entries and each entry holds the start and end addresses for one loaded
object, along with a pointer to the eh_frame_header. The cache resides in
static storage, and so accessing it is always safe.
I think what you're saying is that the p_eh_frame_hdr field could end up
with a dangling pointer due to a dlclose call?
Yes, that can happen.
If so, my argument is that, as long as the cache is up to date as of the
start of unwind, any attempt to access a dangling p_eh_frame_hdr means that
in-use code was dlclosed, in which case unwind is guaranteed to fail anyway.
The failure would just have different symptoms with such a cache in place.
Am I missing something?
I think you're right about that. But what happens if the entry is not
in the cache? Or, do you mean you want to look in the cache before
calling dl_iterate_phdr? That should be safe but of course you still
need a lock as multiple threads can be manipulating the cache at the
same time.
OK, here's a proof of concept I threw together as a preload library of
sorts. You can compile it as a .o and link into the app directly, or
compile it as .so and LD_PRELOAD it into an existing app. Linking
directly against the .so doesn't seem to work for some reason. However
the app incorporates it, though, I confirmed that it removes the
bottleneck.
The short version is that it overrides _Unwind_Find_FDE and allocates a
4kB global FDE table cache (~85 entries), which threads search
lock-free. Per-entry checksums ensure that threads don't try to use
inconsistent entries in the event they race with an updater. Calls to
dlclose blow away the cache. Whenever a thread misses in cache, it calls
dl_iterate_phdr to serve the miss and, if a suitable FDE table is found
also inserts a new cache entry for it. In the event that this also fails
(if the FDE table is not sorted, or if it's an obsolete object that uses
registered unwind info rather than eh_frame), it falls back to the
original implementation.
The .h just cobbles together various definitions from the gcc sources
(with citations) so the code can compile stand-alone; the .cpp started
from the original _Unwind_Find_FDE and tweaked it to add the caching.
I'm not sure the best way to incorporate something like this into gcc,
but it turned out surprisingly self-contained, which should at least
help uptake. Some of the uglier parts of the code (the dlsym hacks and
the extra mutex) would be unnecessary in a properly integrated solution.
FYI, for a bit I had a bug where readers grabbed the mutex as well, and
performance was almost as good as lock-free because the critical section
was so much shorter than before (FDE table search having been moved
outside). Still, I was only using a few threads in my tests, so the
lock-free method is probably better.
Thoughts?
Ryan
/* This file is where we get to mess with things
*/
#include "debug-find-fde.h"
#include <dlfcn.h>
#include <cstdio>
#include <vector>
#include <pthread.h>
#ifdef __cplusplus
extern "C" {
#endif
#if 0
}
#endif
struct fde_table {
signed initial_loc __attribute__ ((mode (SI)));
signed fde __attribute__ ((mode (SI)));
};
/* Conceptually, the cache contains a sorted concatenation of all FDE
entries we know about. However, the tables we're willing to
consider are already sorted in their respective eh_fram_hdr
sections in memory, so we instead record the location of each such
table.
The header cache is sorted, and readers use binary search over the
range 0..hdr_cache_sz to find a lower bound; in the absence of
concurrent updates (see below) the returned entry will contain PC,
if any entry does. From there, the reader performs a second binary
search, this time on the FDE table, and returns whatever it finds.
Each call to dlopen updates the cache on success, a fairly
non-invasive action. Calls to dlclose clear the cache before
unloading the library and rebuild it afterward, so unwinding
threads will have to fall back to the old way during the interim.
Cache updates involve moving elements around, which can confuse
readers if they have the bad luck to unwind during a cache reorg:
1. The binary search may end at the wrong lower bound if entries
shift around during the search. This will cause a miss where there
should have been a hit. There are already fallbacks in place to
cover cache misses that can arise for a number of reasons, so we
choose to live with an occasional extra miss.
2. Binary search may end at (what appears to be) the proper lower
bound, but the reader retrieves an inconsistent cache entry due to
a race with some writer. This could send the reader into la-la land
if undetected, so we protect the reader-accessed parts of each
cache entry with a checksum; if the reader's checksum does not
match, we must either retry or report a miss. As with any checksum,
there is a very small but non-zero probability that a collision
tricks a reader into using inconsistent data. However, given that
pc_low, hdr, and table members are all related strongly to each
other, we would expect them to trigger collisions less often than
random data, for the same reason that the Fletcher checksum we use
works well. Updaters reduce the risk even further by zeroing the
checksum before making updates to a cache entry; the zero checksum
cannot arise naturally, guaranteeing a checksum failure for any
reader that sees it (e.g. if the OS deschedules the writer before
it finishes updating the entry). The only possible false positive
due to concurrent updates would thus require the reader to arrive
before the writer, read the old checksum, and then to have the
writer change some (but not all) of the remaining values before the
reader accesses them, *and* to have a checksum collision arise (if
all values are changed, the cache entry is actually consistent and
a checksum collision is harmless). Note that the reader does not
need to use any memory fences for this to work; a pair of fences
would make it even safer---at some performance cost---by allowing
the reader to verify the checksum does not change while reading the
other values.
3. A dlclose call could deallocate an FDE table out from under a
reader. This is a user bug (unloading a library while threads are
still executing inside it) and a seg fault or other badness would
follow shortly even without the cache.
*/
struct hdr_cache_element
{
_Unwind_Ptr pc_low;
_Unwind_Ptr pc_high;
void* dbase;
_Unwind_Ptr data_base;
const struct fde_table *table;
unsigned fde_count __attribute__((mode(SI)));
unsigned checksum __attribute__((mode(SI)));
};
/* Tunable. 4kB yields 85 cache entries on 64-bit targets. */
#define HDR_CACHE_CAPACITY (4096/sizeof(hdr_cache_element))
static hdr_cache_element hdr_cache[HDR_CACHE_CAPACITY];
static size_t hdr_cache_sz = 0;
static pthread_mutex_t hdr_cache_mutex = PTHREAD_MUTEX_INITIALIZER;
/* A Fletcher-32 checksum, designed to process a short sequence of
_Unwind_Ptr values incrementally (up to 89 of them).
*/
struct fletcher32_state {
uint32_t sum1;
uint32_t sum2;
};
static void
fletcher32_init(struct fletcher32_state *s)
{
s->sum1 = s->sum2 = 0xffff;
}
static _Unwind_Ptr
fletcher32_update (struct fletcher32_state *s, _Unwind_Ptr val)
{
_Unwind_Ptr orig_val = val;
size_t i;
for (i=0; i < sizeof(_Unwind_Ptr)/sizeof(uint16_t); i++)
{
s->sum1 += val & 0xffff;
s->sum2 += s->sum1;
val >>= 16;
}
return orig_val;
}
static uint32_t
fletcher32_finish (struct fletcher32_state *s)
{
s->sum1 = (s->sum1 & 0xffff) + (s->sum1 >> 16);
s->sum2 = (s->sum2 & 0xffff) + (s->sum2 >> 16);
return (s->sum2 << 16) | s->sum1;
}
static uint32_t
fletcher32_checksum (const struct hdr_cache_element *elem)
{
fletcher32_state s;
fletcher32_init (&s);
fletcher32_update (&s, elem->pc_low);
fletcher32_update (&s, elem->pc_high);
fletcher32_update (&s, (_Unwind_Ptr) elem->dbase);
fletcher32_update (&s, elem->data_base);
fletcher32_update (&s, (_Unwind_Ptr) elem->table);
fletcher32_update (&s, elem->fde_count);
return fletcher32_finish(&s);
}
static void
copy_hdr_cache_element(volatile struct hdr_cache_element *dst,
const struct hdr_cache_element *src,
unsigned int checksum)
{
dst->checksum = 0;
dst->pc_low = src->pc_low;
dst->pc_high = src->pc_high;
dst->dbase = src->dbase;
dst->data_base = src->data_base;
dst->table = src->table;
dst->fde_count = src->fde_count;
if (!checksum)
checksum = fletcher32_checksum ((struct hdr_cache_element *)dst);
dst->checksum = checksum;
}
/* Search the given FDE table and populate fields of data if appropriate.
Return 0 if this is the wrong table (continue searching)
Return 1 if this is the correct table (no guarantee of an FDE)
*/
static int
search_fde_table(struct unw_eh_callback_data *data,
const struct hdr_cache_element *elem)
{
const struct fde_table *table = elem->table;
_Unwind_Ptr fde_count = elem->fde_count;
size_t lo, hi, mid;
_Unwind_Ptr data_base = elem->data_base;
fde *f;
unsigned int f_enc, f_enc_size;
_Unwind_Ptr range;
int in_bounds = 0;
_Unwind_Ptr pc = data->pc - data_base;
mid = fde_count - 1;
if (pc < table[0].initial_loc)
return 0;
if (pc < table[mid].initial_loc)
{
in_bounds = 1;
lo = 0;
hi = mid;
while (lo < hi)
{
mid = (lo + hi) / 2;
if (pc < table[mid].initial_loc)
hi = mid;
else if (pc >= table[mid + 1].initial_loc)
lo = mid + 1;
else
break;
}
assert (lo < hi);
}
f = (fde *) (table[mid].fde + data_base);
f_enc = get_fde_encoding (f);
f_enc_size = size_of_encoded_value (f_enc);
read_encoded_value_with_base (f_enc & 0x0f, 0,
&f->pc_begin[f_enc_size], &range);
if (pc < table[mid].initial_loc + range)
data->ret = f;
data->func = (void *) (table[mid].initial_loc + data_base);
data->dbase = elem->dbase;
return in_bounds;
}
/* Search the hdr_cache, knowing that we might race with an update at
any time. Return a copy of the best candidate, if any, after
verifying its integrity.
*/
static int
search_hdr_cache(hdr_cache_element *elem, _Unwind_Ptr pc)
{
size_t lo, hi, mid;
size_t cache_sz = hdr_cache_sz;
if (!cache_sz || pc < hdr_cache[0].pc_low)
return 0;
mid = cache_sz - 1;
if (pc < hdr_cache[mid].pc_low)
{
lo = 0;
hi = mid;
while (lo < hi)
{
mid = (lo + hi) / 2;
if (pc < hdr_cache[mid].pc_low)
hi = mid;
else if (pc >= hdr_cache[mid + 1].pc_low)
lo = mid + 1;
else
break;
}
assert (lo < hi);
}
/* We *think* we have the right element, but we need copy and verify
it because there's a chance we raced with an updater. */
copy_hdr_cache_element (elem, &hdr_cache[mid], hdr_cache[mid].checksum);
if (elem->checksum != fletcher32_checksum (elem))
return 0;
return 1;
}
static int
_Unwind_IteratePhdrCallback (struct dl_phdr_info *info, size_t size, void *ptr)
{
struct unw_eh_callback_data *data = (struct unw_eh_callback_data *) ptr;
const ElfW(Phdr) *phdr, *p_eh_frame_hdr, *p_dynamic;
long n, match;
#ifdef __FRV_FDPIC__
struct elf32_fdpic_loadaddr load_base;
#else
_Unwind_Ptr load_base;
#endif
const unsigned char *p;
const struct unw_eh_frame_hdr *hdr;
_Unwind_Ptr eh_frame;
_Unwind_Ptr pc_low = 0, pc_high = 0;
match = 0;
phdr = info->dlpi_phdr;
load_base = info->dlpi_addr;
p_eh_frame_hdr = NULL;
p_dynamic = NULL;
/* Make sure struct dl_phdr_info is at least as big as we need. */
if (size < offsetof (struct dl_phdr_info, dlpi_phnum)
+ sizeof (info->dlpi_phnum))
return -1;
/* See if PC falls into one of the loaded segments. Find the eh_frame
segment at the same time. */
for (n = info->dlpi_phnum; --n >= 0; phdr++)
{
if (phdr->p_type == PT_LOAD)
{
_Unwind_Ptr vaddr = (_Unwind_Ptr)
__RELOC_POINTER (phdr->p_vaddr, load_base);
if (data->pc >= vaddr && data->pc < vaddr + phdr->p_memsz)
{
match = 1;
pc_low = vaddr;
pc_high = vaddr + phdr->p_memsz;
}
}
else if (phdr->p_type == PT_GNU_EH_FRAME)
p_eh_frame_hdr = phdr;
#ifdef PT_SUNW_UNWIND
/* Sun ld emits PT_SUNW_UNWIND .eh_frame_hdr sections instead of
PT_SUNW_EH_FRAME/PT_GNU_EH_FRAME, so accept them as well. */
else if (phdr->p_type == PT_SUNW_UNWIND)
p_eh_frame_hdr = phdr;
#endif
else if (phdr->p_type == PT_DYNAMIC)
p_dynamic = phdr;
}
if (!match)
return 0;
if (!p_eh_frame_hdr)
return 0;
/* Read .eh_frame_hdr header. */
hdr = (const struct unw_eh_frame_hdr *)
__RELOC_POINTER (p_eh_frame_hdr->p_vaddr, load_base);
if (hdr->version != 1)
return 1;
#ifdef CRT_GET_RFIB_DATA
# ifdef __i386__
data->dbase = NULL;
if (p_dynamic)
{
/* For dynamically linked executables and shared libraries,
DT_PLTGOT is the gp value for that object. */
ElfW(Dyn) *dyn = (ElfW(Dyn) *)
__RELOC_POINTER (p_dynamic->p_vaddr, load_base);
for (; dyn->d_tag != DT_NULL ; dyn++)
if (dyn->d_tag == DT_PLTGOT)
{
data->dbase = (void *) dyn->d_un.d_ptr;
#if defined __linux__
/* On IA-32 Linux, _DYNAMIC is writable and GLIBC has
relocated it. */
#elif defined __sun__ && defined __svr4__
/* On Solaris 2/x86, we need to do this ourselves. */
data->dbase += load_base;
#endif
break;
}
}
# elif defined __FRV_FDPIC__ && defined __linux__
data->dbase = load_base.got_value;
# elif defined __x86_64__ && defined __sun__ && defined __svr4__
/* While CRT_GET_RFIB_DATA is also defined for 64-bit Solaris 10+/x86, it
doesn't apply since it uses DW_EH_PE_pcrel encoding. */
# else
# error What is DW_EH_PE_datarel base on this platform?
# endif
#endif
p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
base_from_cb_data (hdr->eh_frame_ptr_enc,
data),
(const unsigned char *) (hdr + 1),
&eh_frame);
/* We require here specific table encoding to speed things up.
Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start
as base, not the processor specific DW_EH_PE_datarel. */
if (hdr->fde_count_enc != DW_EH_PE_omit
&& hdr->table_enc == (DW_EH_PE_datarel | DW_EH_PE_sdata4))
{
_Unwind_Ptr fde_count;
p = read_encoded_value_with_base (hdr->fde_count_enc,
base_from_cb_data (hdr->fde_count_enc,
data),
p, &fde_count);
/* Shouldn't happen. */
if (fde_count == 0)
return 1;
if ((((_Unwind_Ptr) p) & 3) == 0)
{
struct hdr_cache_element elem;
const struct fde_table *table = (const struct fde_table *) p;
size_t lo, hi, mid;
_Unwind_Ptr data_base = (_Unwind_Ptr) hdr;
fde *f;
unsigned int f_enc, f_enc_size;
_Unwind_Ptr range;
mid = fde_count - 1;
pc_low = table[0].initial_loc + data_base;
if (data->pc < pc_low)
return 1;
f = (fde *) (table[mid].fde + data_base);
f_enc = get_fde_encoding (f);
f_enc_size = size_of_encoded_value (f_enc);
read_encoded_value_with_base (f_enc & 0x0f, 0,
&f->pc_begin[f_enc_size], &range);
pc_high = table[mid].initial_loc + data_base + range;
if (data->pc < pc_high)
{
hdr_cache_element elem;
elem.pc_low = pc_low;
elem.pc_high = pc_high;
elem.dbase = data->dbase;
elem.data_base = data_base;
elem.table = table;
elem.fde_count = fde_count;
if (hdr_cache_sz < HDR_CACHE_CAPACITY)
{
size_t i, j;
int found = 0;
for (i=0; i < hdr_cache_sz; i++)
{
if (hdr_cache[i].pc_low == pc_low)
{
found = 1;
break;
}
else if (hdr_cache[i].pc_low > pc_low)
break;
}
if (!found)
{
/* i currently points to the first entry that belongs
after pc_low; insert the new entry after shifting
over any successors to make room. */
for (j=hdr_cache_sz; j > i; j--)
copy_hdr_cache_element (&hdr_cache[j], &hdr_cache[j-1],
hdr_cache[j-1].checksum);
copy_hdr_cache_element (&hdr_cache[j], &elem, 0);
hdr_cache_sz++;
}
}
/* Do the actual search */
return search_fde_table(data, &elem);
}
}
}
return 1;
}
typedef int (dlclose_fn) (void *);
typedef const fde * (unwind_find_fde_fn) (void *, struct dwarf_eh_bases *);
static dlclose_fn *dlclose_orig;
static unwind_find_fde_fn *_Unwind_Find_FDE_orig;
int
dlclose (void *handle)
{
int rval;
pthread_mutex_lock (&hdr_cache_mutex);
hdr_cache_sz = 0;
rval = dlclose_orig (handle);
pthread_mutex_unlock (&hdr_cache_mutex);
return rval;
}
void __attribute__ ((constructor))
init ()
{
dlclose_orig = (dlclose_fn *) dlsym (RTLD_NEXT, "dlclose");
assert (dlclose_orig);
_Unwind_Find_FDE_orig = (unwind_find_fde_fn *) dlsym (RTLD_NEXT,
"_Unwind_Find_FDE");
assert (_Unwind_Find_FDE_orig);
}
const fde *
_Unwind_Find_FDE (void *pc_ptr, struct dwarf_eh_bases *bases)
{
struct unw_eh_callback_data data;
const fde *ret;
hdr_cache_element elem;
_Unwind_Ptr pc = (_Unwind_Ptr) pc_ptr;
int rval;
//fprintf(stderr, "-> _Unwind_Find_FDE intercepted\n");
/* First try the cache. */
if (search_hdr_cache (&elem, pc))
{
data.pc = pc;
data.tbase = NULL;
data.dbase = NULL;
data.func = NULL;
data.ret = NULL;
if (search_fde_table (&data, &elem))
{
if (data.ret)
{
//fprintf(stderr, " Cache hit!\n");
bases->tbase = data.tbase;
bases->dbase = data.dbase;
bases->func = data.func;
}
return data.ret;
}
}
/* Try to get the info from the loader (and update the cache while we're at
it) */
data.pc = pc;
data.tbase = NULL;
data.dbase = NULL;
data.func = NULL;
data.ret = NULL;
data.check_cache = -1;
pthread_mutex_lock (&hdr_cache_mutex);
rval = dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data);
pthread_mutex_unlock (&hdr_cache_mutex);
if (rval >= 0)
{
if (data.ret)
{
bases->tbase = data.tbase;
bases->dbase = data.dbase;
bases->func = data.func;
}
return data.ret;
}
/* Fall back to the original implementation */
return _Unwind_Find_FDE(&data, bases);
}
#if 0
{
#endif
#ifdef __cplusplus
}
#endif
#ifdef TEST_ME
#include "backtrace.h"
#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <dirent.h>
#include <cerrno>
#include <cstring>
#include <cstdlib>
#include <syscall.h>
#include <unistd.h>
#include <dlfcn.h>
static struct backtrace_state *bts;
void bterr(void* data, char const *msg, int errnum) {
fprintf(stderr, "Error %d: %s\n", errnum, msg);
}
int btfull(void* data, uintptr_t pc, char const *file, int line, char const*
func) {
printf("%s:%d: 0x%lx %s\n", file, line, pc, func);
return 0;// not strcmp("main", func);
}
extern "C" void handler(int snum) {
fprintf(stderr, "I know, I'm a bad boy for calling this in a signal
handler\n");
backtrace_full(bts, 0, &btfull, &bterr, 0);
fprintf(stderr, "\n\n");
}
void *dlh;
extern "C" {
int foo(int a, int b) {
if (0 and dlclose(dlh)) {
fprintf(stderr, "Unable to dlclose library: %s\n", dlerror());
}
raise(SIGUSR1);
return a + b;
}
int bar(int a, int b) {
return a + foo(b, 1);
}
int baz(int a, int b) {
return a + bar(b, 2);
}
typedef int (function)(int,int,int(*)(int,int));
}
//extern "C" int blah(int a, int b, int (*fn)(int,int));
//extern "C" function blah;
#if 0
extern "C" int blah(int a, int b, int (*fn)(int,int)) {
printf("My blah: %p\n", fn);
return 0;
}
#endif
int main() {
bts = backtrace_create_state(0, 1, &bterr, 0);
backtrace_full(bts, 0, &btfull, &bterr, 0);
fprintf(stderr, "\n\n");
struct sigaction sa, old_sa;
sigset_t mask;
sigemptyset(&mask);
sa.sa_handler = &handler;
sa.sa_mask = mask;
sa.sa_flags = 0;
sigaction(SIGUSR1, &sa, &old_sa);
#if 1
dlh = dlopen("./libbt.so", RTLD_NOW);
if (not dlh) {
fprintf(stderr, "dlopen failed: %s\n", dlerror());
exit(1);
}
function *blah = (function*) dlsym(dlh, "blah");
if (not blah) {
fprintf(stderr, "can't find symbol 'blah': %s\n", dlerror());
exit(1);
}
#endif
printf("%d\n", blah(1, 2, &baz));
//return 0;
// see
http://stackoverflow.com/questions/6581297/broadcasting-a-signal-to-all-threads-in-linux
pid_t pid = getpid();
char const * dname = "/proc/self/task";
if (DIR *d = opendir(dname)) {
while (dirent *dir = readdir(d)) {
if (int tid = atoi(dir->d_name)) {
fprintf(stderr, "About to signal thread %s(%d)\n", dir->d_name,
tid);
syscall(SYS_tgkill, pid, tid, SIGUSR1);
}
}
closedir(d);
}
else {
fprintf(stderr, "Unable to open %s: %s\n", dname, strerror(errno));
}
sigaction(SIGUSR1, &old_sa, 0);
return 0;
}
#endif
/** This file holds the generici stuff we need to make unwinding work,
copied more or less verbatim from the various sources indicated.
*/
#include <unwind.h>
#include <elf.h>
#include <link.h>
#include <cstddef>
#include <cstdlib>
#include <cassert>
#include <cstring>
/* Taken from libgcc/unwind-pe.h
vvvvvvvvvv
*/
/* Pointer encodings, from dwarf2.h. */
#define DW_EH_PE_absptr 0x00
#define DW_EH_PE_omit 0xff
#define DW_EH_PE_uleb128 0x01
#define DW_EH_PE_udata2 0x02
#define DW_EH_PE_udata4 0x03
#define DW_EH_PE_udata8 0x04
#define DW_EH_PE_sleb128 0x09
#define DW_EH_PE_sdata2 0x0A
#define DW_EH_PE_sdata4 0x0B
#define DW_EH_PE_sdata8 0x0C
#define DW_EH_PE_signed 0x08
#define DW_EH_PE_pcrel 0x10
#define DW_EH_PE_textrel 0x20
#define DW_EH_PE_datarel 0x30
#define DW_EH_PE_funcrel 0x40
#define DW_EH_PE_aligned 0x50
#define DW_EH_PE_indirect 0x80
static unsigned int
size_of_encoded_value (unsigned char encoding)
{
if (encoding == DW_EH_PE_omit)
return 0;
switch (encoding & 0x07)
{
case DW_EH_PE_absptr:
return sizeof (void *);
case DW_EH_PE_udata2:
return 2;
case DW_EH_PE_udata4:
return 4;
case DW_EH_PE_udata8:
return 8;
}
abort ();
}
/* Read an unsigned leb128 value from P, store the value in VAL, return
P incremented past the value. We assume that a word is large enough to
hold any value so encoded; if it is smaller than a pointer on some target,
pointers should not be leb128 encoded on that target. */
static const unsigned char *
read_uleb128 (const unsigned char *p, _uleb128_t *val)
{
unsigned int shift = 0;
unsigned char byte;
_uleb128_t result;
result = 0;
do
{
byte = *p++;
result |= ((_uleb128_t)byte & 0x7f) << shift;
shift += 7;
}
while (byte & 0x80);
*val = result;
return p;
}
/* Similar, but read a signed leb128 value. */
static const unsigned char *
read_sleb128 (const unsigned char *p, _sleb128_t *val)
{
unsigned int shift = 0;
unsigned char byte;
_uleb128_t result;
result = 0;
do
{
byte = *p++;
result |= ((_uleb128_t)byte & 0x7f) << shift;
shift += 7;
}
while (byte & 0x80);
/* Sign-extend a negative value. */
if (shift < 8 * sizeof(result) && (byte & 0x40) != 0)
result |= -(((_uleb128_t)1L) << shift);
*val = (_sleb128_t) result;
return p;
}
/* Load an encoded value from memory at P. The value is returned in VAL;
The function returns P incremented past the value. BASE is as given
by base_of_encoded_value for this encoding in the appropriate context. */
static const unsigned char *
read_encoded_value_with_base (unsigned char encoding, _Unwind_Ptr base,
const unsigned char *p, _Unwind_Ptr *val)
{
union unaligned
{
void *ptr;
unsigned u2 __attribute__ ((mode (HI)));
unsigned u4 __attribute__ ((mode (SI)));
unsigned u8 __attribute__ ((mode (DI)));
signed s2 __attribute__ ((mode (HI)));
signed s4 __attribute__ ((mode (SI)));
signed s8 __attribute__ ((mode (DI)));
} __attribute__((__packed__));
const union unaligned *u = (const union unaligned *) p;
_Unwind_Internal_Ptr result;
if (encoding == DW_EH_PE_aligned)
{
_Unwind_Internal_Ptr a = (_Unwind_Internal_Ptr) p;
a = (a + sizeof (void *) - 1) & - sizeof(void *);
result = *(_Unwind_Internal_Ptr *) a;
p = (const unsigned char *) (_Unwind_Internal_Ptr) (a + sizeof (void *));
}
else
{
switch (encoding & 0x0f)
{
case DW_EH_PE_absptr:
result = (_Unwind_Internal_Ptr) u->ptr;
p += sizeof (void *);
break;
case DW_EH_PE_uleb128:
{
_uleb128_t tmp;
p = read_uleb128 (p, &tmp);
result = (_Unwind_Internal_Ptr) tmp;
}
break;
case DW_EH_PE_sleb128:
{
_sleb128_t tmp;
p = read_sleb128 (p, &tmp);
result = (_Unwind_Internal_Ptr) tmp;
}
break;
case DW_EH_PE_udata2:
result = u->u2;
p += 2;
break;
case DW_EH_PE_udata4:
result = u->u4;
p += 4;
break;
case DW_EH_PE_udata8:
result = u->u8;
p += 8;
break;
case DW_EH_PE_sdata2:
result = u->s2;
p += 2;
break;
case DW_EH_PE_sdata4:
result = u->s4;
p += 4;
break;
case DW_EH_PE_sdata8:
result = u->s8;
p += 8;
break;
default:
abort ();
}
if (result != 0)
{
result += ((encoding & 0x70) == DW_EH_PE_pcrel
? (_Unwind_Internal_Ptr) u : base);
if (encoding & DW_EH_PE_indirect)
result = *(_Unwind_Internal_Ptr *) result;
}
}
*val = result;
return p;
}
/* ^^^^^^^^^^
*/
/* Taken from libgcc/unwind-dw2-fde.h
vvvvvvvvvv
*/
struct dwarf_eh_bases
{
void *tbase;
void *dbase;
void *func;
};
typedef int sword __attribute__ ((mode (SI)));
typedef unsigned int uword __attribute__ ((mode (SI)));
typedef unsigned int uaddr __attribute__ ((mode (pointer)));
typedef int saddr __attribute__ ((mode (pointer)));
typedef unsigned char ubyte;
/* Terminology:
CIE - Common Information Element
FDE - Frame Descriptor Element
There is one per function, and it describes where the function code
is located, and what the register lifetimes and stack layout are
within the function.
The data structures are defined in the DWARF specification, although
not in a very readable way (see LITERATURE).
Every time an exception is thrown, the code needs to locate the FDE
for the current function, and starts to look for exception regions
from that FDE. This works in a two-level search:
a) in a linear search, find the shared image (i.e. DLL) containing
the PC
b) using the FDE table for that shared object, locate the FDE using
binary search (which requires the sorting). */
/* The first few fields of a CIE. The CIE_id field is 0 for a CIE,
to distinguish it from a valid FDE. FDEs are aligned to an addressing
unit boundary, but the fields within are unaligned. */
struct dwarf_cie
{
uword length;
sword CIE_id;
ubyte version;
unsigned char augmentation[];
} __attribute__ ((packed, aligned (__alignof__ (void *))));
/* The first few fields of an FDE. */
struct dwarf_fde
{
uword length;
sword CIE_delta;
unsigned char pc_begin[];
} __attribute__ ((packed, aligned (__alignof__ (void *))));
typedef struct dwarf_fde fde;
/* Locate the CIE for a given FDE. */
static inline const struct dwarf_cie *
get_cie (const struct dwarf_fde *f)
{
return (const struct dwarf_cie *) ((const char *)&f->CIE_delta -
f->CIE_delta);
}
static inline const fde *
next_fde (const fde *f)
{
return (const fde *) ((const char *) f + f->length + sizeof (f->length));
}
/* ^^^^^^^^^^
*/
/* Taken from libgcc/unwind-dw2-fde.c
vvvvvvv
*/
/* Return the FDE pointer encoding from the CIE. */
/* ??? This is a subset of extract_cie_info from unwind-dw2.c. */
static int
get_cie_encoding (const struct dwarf_cie *cie)
{
const unsigned char *aug, *p;
_Unwind_Ptr dummy;
_uleb128_t utmp;
_sleb128_t stmp;
aug = cie->augmentation;
p = aug + strlen ((const char *)aug) + 1; /* Skip the augmentation string. */
if (__builtin_expect (cie->version >= 4, 0))
{
if (p[0] != sizeof (void *) || p[1] != 0)
return DW_EH_PE_omit; /* We are not prepared to handle
unexpected
address sizes or segment selectors.
*/
p += 2; /* Skip address size and segment size.
*/
}
if (aug[0] != 'z')
return DW_EH_PE_absptr;
p = read_uleb128 (p, &utmp); /* Skip code alignment. */
p = read_sleb128 (p, &stmp); /* Skip data alignment. */
if (cie->version == 1) /* Skip return address column. */
p++;
else
p = read_uleb128 (p, &utmp);
aug++; /* Skip 'z' */
p = read_uleb128 (p, &utmp); /* Skip augmentation length. */
while (1)
{
/* This is what we're looking for. */
if (*aug == 'R')
return *p;
/* Personality encoding and pointer. */
else if (*aug == 'P')
{
/* ??? Avoid dereferencing indirect pointers, since we're
faking the base address. Gotta keep DW_EH_PE_aligned
intact, however. */
p = read_encoded_value_with_base (*p & 0x7F, 0, p + 1, &dummy);
}
/* LSDA encoding. */
else if (*aug == 'L')
p++;
/* Otherwise end of string, or unknown augmentation. */
else
return DW_EH_PE_absptr;
aug++;
}
}
static inline int
get_fde_encoding (const struct dwarf_fde *f)
{
return get_cie_encoding (get_cie (f));
}
/* ^^^^^^^^^^
*/
/* Taken from libgcc/unwind-dw2-fde-dip.c
vvvvvvv
*/
#ifndef __RELOC_POINTER
# define __RELOC_POINTER(ptr, base) ((ptr) + (base))
#endif
struct unw_eh_callback_data
{
_Unwind_Ptr pc;
void *tbase;
void *dbase;
void *func;
const fde *ret;
int check_cache;
};
struct unw_eh_frame_hdr
{
unsigned char version;
unsigned char eh_frame_ptr_enc;
unsigned char fde_count_enc;
unsigned char table_enc;
};
/* Like base_of_encoded_value, but take the base from a struct
unw_eh_callback_data instead of an _Unwind_Context. */
static _Unwind_Ptr
base_from_cb_data (unsigned char encoding, struct unw_eh_callback_data *data)
{
if (encoding == DW_EH_PE_omit)
return 0;
switch (encoding & 0x70)
{
case DW_EH_PE_absptr:
case DW_EH_PE_pcrel:
case DW_EH_PE_aligned:
return 0;
case DW_EH_PE_textrel:
return (_Unwind_Ptr) data->tbase;
case DW_EH_PE_datarel:
return (_Unwind_Ptr) data->dbase;
default:
abort();
}
}