[ tl;dr See the newly added struct-guard/c ]

I agree this is a problem (and a pernicious one!).

Thinking about it from the contract system design point of view, the
way serializable struct work this looks a lot like the situation where
the module author has (separately) exported the constructor without a
contract. And that's clearly a use-case we need to support (assuming
we believe in the idea of boundaries and having more than one
contract, etc; things that are pretty fundamental to the current
design).

Thinking about it from the serialization point of view, there is no
place where the programmer can communicate with the serialization
library to say "this use of deserialize (inside the module) is
different from that one (outside the module) so please check contracts
here instead of there".

As far as I can tell, the only way to rationalize these is to have a
single form that puts the contracts on a boundary with the
serialization information at the same place, meaning that all uses of
deserialize (and all uses of the selectors, etc.) have the contract
checks on them. This is, I believe, the solution that Matthew
suggested (as linked from the message below). There would probably
need to be some help to make that smoother if we wanted to continue in
that specific direction, tho.

Instead (thanks to discussion with Matthew and Christos), I've added
struct-guard/c. It accepts contracts and returns a procedure suitable
for use as the #:guard argument to serializable-struct and struct (and
friends) that checks the contracts when the constructor is called.

Note that this approach does not (cannot) check contracts when a
struct is mutated. So be sure to use it only when you have an
immutable struct or a struct whose mutable fields can safely be
`any/c`.

Here's an example of use, based on Philip's below.

#lang racket/base

(module server racket/base
  (require racket/serialize
           racket/contract)
  (provide (struct-out boxed-set))
  (serializable-struct boxed-set (v)
                       #:transparent
                       #:guard (struct-guard/c integer?)))

(require racket/serialize
         racket/list
         (submod "." server))

(define (corrupt-serialized s)
  (list-set s 6 '(0 "not a set")))

(define (make-empty-boxed-set/bad)
  (deserialize
   (corrupt-serialized
    (serialize (boxed-set 1)))))

(boxed-set-v (make-empty-boxed-set/bad))


Robby


On Tue, Feb 5, 2019 at 10:35 PM Philip McGrath <phi...@philipmcgrath.com> wrote:
>
> I was (relatively) recently bitten by an issue with putting contracts on 
> serializable structs. What's worse, once I figured out what was going on, I 
> realized that I'd run into this very issue before and even discussed it on 
> this list. In that case, though, I'd encountered the problem with 
> higher-order contracts and other complications: I hadn't fully appreciated 
> that the same caveats apply to much simpler cases.
>
> I want to write up this problem for the list, both because I suspect others 
> are making the same oversight I was and because I think this problem deserves 
> a linguistic solution.
>
> In essence, the problem is that, if you are using `serializable-struct`, any 
> contracts you may put on the struct (usually through a `struct` clause in 
> `contract-out`) are not enforced when instances are deserialized with 
> `deserialize`. Here is an example program:
>
> #lang racket
>
> (module server racket
>   (require racket/serialize)
>   (provide (contract-out
>             [struct boxed-set ([v set?])]))
>   (serializable-struct boxed-set (v)
>     #:transparent))
>
> (module bad racket
>   (require racket/serialize
>            (submod ".." server))
>   (provide (contract-out
>             [make-empty-boxed-set/bad
>              (-> boxed-set?)]))
>   (define (corrupt-serialized s)
>     (list-set s 6 '(0 "not a set")))
>   (define (make-empty-boxed-set/bad)
>     (deserialize
>      (corrupt-serialized
>       (serialize (boxed-set (set)))))))
>
> (module main racket
>   (require (submod ".." server)
>            (submod ".." bad))
>   (boxed-set-v (make-empty-boxed-set/bad)))
>
> The `server` module tries to protect its `boxed-set` datatype with a contract 
> in just the way I was doing, which I expect many other Racketeers are doing 
> as well. If I'd written such a module with `struct` instead of 
> `serializable-struct`, I would know that no other module will be able to 
> construct a `boxed-set` instance with a bad value. If I know that my own 
> module doesn't create malformed instances, either (perhaps because it doesn't 
> construct any instances), I could correctly assume that every instance 
> satisfies its contract, giving all of the familiar benefits for all parties 
> when reasoning about their code.
>
> Using `serializable-struct` opens a loophole in this guarantee. The expansion 
> of `serializable-struct` uses the same lower-level mechanisms available for 
> implementing a custom serialized representation: it generates a 
> `prop:serializable` property and a `deserialize-info` value, which is 
> exported from the `deserialize-info` submodule as 
> `deserialize-info:boxed-set-v0`. The `deserialize-info` value encapsulates a 
> function to construct a `boxed-set` instance from the serialized 
> representation, which is implemented using the `boxed-set` constructor. The 
> key point is that the expansion of `serializable-struct` uses the version of 
> the `boxed-set` constructor from inside the `server` module, which is not 
> protected by a contract. Thus, `deserialize` effectively exposes the 
> contract-less constructor, albeit quite indirectly.
>
> The `bad` module demonstrates how malformed instances of `boxed-set` can be 
> constructed without triggering a contract violation. The `main` module then 
> attempts to use `boxed-set-v` on a malformed value it has gotten from `bad`, 
> triggering this exception, which accurately blames `server` for failing to 
> live up to its promised contract on `boxed-set-v`:
> boxed-set-v: broke its own contract
>   promised: set?
>   produced: "not a set"
>   in: the range of
>       (-> boxed-set? set?)
>   contract from:
>       (<...>/serial-contract.rkt server)
>   blaming: (<...>/serial-contract.rkt server)
>    (assuming the contract is correct)
>   at: <...>/serial-contract.rkt:7.23
>
> The especially pernicious part is that, if `main` hadn't used `boxed-set-v`, 
> the error might not have been detected until the value had flowed far away 
> into some unrelated code.
>
> Why is this significant in practice? Realistically, I don't expect 
> programmers would write functions like `corrupt-serialized` in an attempt to 
> deliberately exploit some library's invariant, and the chance of someone 
> doing so accidentally seems fairly remote. What worries me much more is that 
> I read serializable structs in over the network, stick them in databases, and 
> write them to disk. In the absence of a contract, any struct that has ever 
> been deserialized may be silently corrupt. Some bits could have been flipped 
> or some other process could have mangled a file on disk. An attacker could 
> have intentionally manipulated a value sent over the network as part of some 
> potential exploit. Perhaps most likely, the could have been a 
> backwards-incompatible change (perhaps an unintentional one) to the contract 
> on some field of the struct.
>
> A contract couldn't protect me from encountering a malformed serialized 
> representation, but it would let me detect the error as soon as I try to 
> deserialize the bad value, rather than delaying the error until I actually 
> attempt to access a bad field. As things stand, if I usually just pass the 
> value around and move it in and out of storage, I might be unknowingly 
> passing around bad data indefinitely.
>
> I'm particularly concerned that people may be vulnerable to these 
> possibilities without realizing it, as I have been. In every other way I can 
> think of, `serializable-struct` is a drop-in replacement for `struct`, and, 
> as I explained, a `server` module that used `struct` would have been sound—so 
> it's surprising that the version with `serializable-struct` isn't, especially 
> since the unprotected constructor is exposed through a side-channel that 
> isn't obvious in the source text of the program.
>
> I said at the outset that I think this problem deserves a linguistic 
> solution. Ultimately, I would like it to be as easy to implement a protected 
> serializable struct as it is to use `struct` and `contract-out`. All of the 
> workarounds that I know of seem to have significant limitations. One approach 
> would be to drop down to the level of `prop:serializable` and 
> `make-deserialize-info` and write manual checks, but that seems bad for all 
> of the reasons contracts are so much better than manual checks in general. 
> Robby had suggested expecting clients not to use `deserialize` directly, 
> which could help in a number of cases, but it would be a problem when working 
> with libraries like `#lang web-server` that expect to be given serializable 
> values and to serialize them for you. Matthew gave an example showing how to 
> put the `deserialize-info` in a separate module, so that it uses the 
> contracted version of the constructor: this could well be the basis for a 
> robust solution, but it would definitely need a friendlier API than 
> manipulating module-path indices. It might well help to add some hooks to 
> `racket/serialize`—maybe something like `chaperone-serializable-struct`?
>
> I don't have a very concrete vision of how to approach this, but I think it's 
> worth doing. In the mean time, I hope this is helpful to anyone else who may 
> think they've been enforcing stronger guarantees than they actually have.
>
> -Philip
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to