On Wed, Jul 17, 2019 at 12:49 PM Mark H Weaver <m...@netris.org> wrote:

> Hi Linas,
>
> > Investigating the crash with good-old printf's in libguile/vm.c produces
> > a vast ocean of prints ... that should have not been printed, and/or
> should
> > have been actual errors, but somehow were not handled by scm_error.
> > Using today's git pull of master, here's the diff containing a printf:
> >
> > --- a/libguile/vm.c
> > +++ b/libguile/vm.c
> > @@ -1514,12 +1514,23 @@ thread->guard); fflush(stdout); assert (0); }
> >
> >        proc = SCM_SMOB_DESCRIPTOR (proc).apply_trampoline;
> >        SCM_FRAME_LOCAL (vp->fp, 0) = proc;
> >        return SCM_PROGRAM_CODE (proc);
> >      }
> >
> > +printf("duuude wrong type to apply!\n"
> > +"proc=%lx\n"
> > +"ip=%p\n"
> > +"sp=%p\n"
> > +"fp=%p\n"
> > +"sp_min=%p\n"
> > +"stack_lim=%p\n",
> > +SCM_FRAME_SLOT(vp->fp, 0)->as_u64,
> > +vp->ip, vp->sp, vp->fp, vp->sp_min_since_gc, vp->stack_limit);
> > +fflush(stdout);
> > +
> >    vp->ip = SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp);
> >
> >    scm_error (scm_arg_type_key, NULL, "Wrong type to apply: ~S",
> >               scm_list_1 (proc), scm_list_1 (proc));
> >  }
> >
> > As you can see, shortly after my printf, there should have been an
> > error report.
>
> Not necessarily.  Note that what 'scm_error' actually does is to raise
> an exception.  What happens next depends on what exception handlers are
> installed at the time of the error.
>

OK, but... when I look at what get_callee_vcode() actually does, it seems
to be earnestly trying to fish out the location of a callable function from
the
frame pointer, and it does so three plausible ways. If those three don't
work
out, then it sets the instruction pointer (to the garbage value), followed
by
scm_error(Wrong type to apply). This also looks like an earnest, honest
attempt to report a real error.  But lets double-check.

So who calls get_callee_vcode(), and why, and what did they expect to
happen?
Well, that's in three places: one in scm_call_n which is a plausible place
where
one might expect the instruction pointer to be set to a valid value. Then
there's two
places in vm-engine.c -- "tail-call" and "call" both of which one might
plausibly expect
to have a valid instruction pointer.  I can't imagine any valid scenario
where anyone
was expecting get_callee_vcode() to actually fail in the normal course of
operations.

That is, I can't think of any valid reason why anyone would want to suppress
the scm_error().  And even if I could -- calling scm_error() hundreds of
times
per second, as fast as possible, does not seem like efficient coding for
dealing
with a call to an invalid address.

Anyway I'm trying to track down where the invalid value gets set. No luck
so far.
There are 6 or 8 places in vm-engine.c where the frame pointer is set to
something
that isn't a pointer (which seems like cheating to me: passing non-pointer
values
in something called "pointer" is .. well, knee jerk reaction is that it's
not wise, but
there may be a deeper reason.)


>
> > There is no error report... until 5-10 minutes later, when the error
> > report itself causes a crash.  Before then, I get an endless
> > high-speed spew of prints:
>
> It looks like another error is happening within the exception handler.
>

Well, yes, that also. But given that the instruction pointer contains
garbage
its perhaps not entirely surprising... at best, the question is, why didn't
it fail
sooner?

-- Linas

>
>        Mark
>
> PS: It would be good to pick either 'guile-devel' or 'guile-user' for
>     continuation of this thread.  I don't see a reason why it should be
>     sent to both lists.
>


-- 
cassette tapes - analog TV - film cameras - you

Reply via email to