On Friday, July 10, 2015 at 12:05:55 AM UTC-7, Matthew Flatt wrote:
> At Thu, 9 Jul 2015 10:08:53 -0700 (PDT), Scott Bell wrote:
> > On Thursday, July 9, 2015 at 12:27:26 AM UTC-7, Matthew Flatt wrote:
> > > At Wed, 8 Jul 2015 18:23:32 -0700 (PDT), Scott Bell wrote:
> > > > On Wednesday, July 8, 2015 at 4:05:20 PM UTC-7, Scott Bell wrote:
> > > > > On Wednesday, July 8, 2015 at 3:48:22 PM UTC-7, neil wrote:
> > > > > > Does adding the executable pathname to the `gdb` command line 
> > > > > > (i.e., 
> > > > > > format `gdb EXECUTABLE-FILE CORE-FILE`) give you the symbols?
> > > > > 
> > > > > Ah, of course. Yes, now we're getting somewhere:
> > > > > 
> > > > > ...
> > > > >     #6  0x000000080153e004 in strlen () from /lib/libc.so.7
> > > > >     #7  0x0000000800a72bf3 in scheme_make_byte_string_without_copying 
> > > > > ()
> > > > >        from /usr/local/lib/libracket3m-6.1.1.so
> > > > >     #8  0x0000000800ade3f2 in c_to_scheme ()
> > > > >        from /usr/local/lib/libracket3m-6.1.1.so
> > > > >     #9  0x0000000800adee02 in ffi_do_call ()
> > > > >        from /usr/local/lib/libracket3m-6.1.1.so
> > > > > ...
> > > > > 
> > > > > That certainly points me in the direction of a bad FFI call.
> > > > 
> > > > When I say bad FFI call, I'm leaving open the possibility that
> > > > the issue may be in the FFI machinery as well as in our source,
> > > > but I should point out that the minimal amount of FFI code 
> > > > that we have is years old and has been running without issue.
> > > > We first observed this crash only a few months ago. 
> > > 
> > > Do you know what foreign function is being called?
> > > 
> > > If so, does it return a GC-managed string pointer, or is it one that's
> > > outside the GC's management?
> > 
> > I don't know which foreign function is being called, but 
> > we don't have a lot of them, and they're all to either
> > system or third-party libraries. Based on the stack trace 
> > of the crash, I'm assuming the relevant calls are the 
> > ones that return a byte string, so here they are:
> > 
> >   (_fun _string _string -> _bytes)
> > 
> >     The return value here is 'char *', and may be NULL on
> >     failure. Racket seems to handle this correctly based
> >     on manual testing and returns #f in the failure case.
> > 
> >   (_fun [EVP_MD : _fpointer = (evp-sha1)]
> >         [key : _bytes]
> >         [key_len : _int = (bytes-length key)]
> >         [data : _bytes]
> >         [data_len : _int = (bytes-length data)]
> >         [md : (_bytes o 20)]
> >         [md_len : (_ptr o _uint)]
> >         -> _bytes
> >         -> md)
> > 
> >     This is the function signature for HMAC from libcrypto,
> >     and in fact this is in Racket at:
> >         web-server-lib/web-server/stuffers/hmac-sha1.rkt
> >     The return value should be the same pointer passed in
> >     as `md` or NULL on failure.
> > 
> > So it looks like one call has callee allocated memory, and
> > the other is allocated by Racket using the (_bytes o n)
> > custom function type. I'm not sure whether or not the 
> > latter is GC-managed.
> 
> I see that the byte string produced by `(_bytes o n)` is not really
> compatible with `_bytes` as a return type. The problem is that `(_bytes
> o 20)` allocates only 20 bytes, instead of 21 bytes with the last byte
> as a terminator. It's possible that the allocated bytes are not zeroed,
> and attempting to get the length of the byte string as a result runs
> off the edge of an allocated page to unallocated space.
> 
> Does using this definition of `_bytes*` (which is exported from
> `ffi/unsafe` as `_bytes`) change anything for your program?
> 
> (define-fun-syntax _bytes*
>   (syntax-id-rules (o)
>     [(_ o n) (type: _gcpointer
>               pre:  (let ([bstr (make-sized-byte-string (malloc (add1 n)) n)])
>                       (ptr-set! bstr _byte n 0)
>                       bstr)
>               ;; post is needed when this is used as a function output type
>               post: (x => (make-sized-byte-string x n)))]
>     [(_ . xs) (_bytes . xs)]
>     [_ _bytes]))

I've imported this change and will deploy shortly. This
seems like a probable fix. An alternative would be to 
construct the returned byte string using `md_len' and
avoid the call to strlen entirely. I'm not familiar enough
with the FFI to tell how straight-forward that would be.

Thanks Matthew, Neil, and Juan for the help! 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to