Making Wrong Code Type Wrong

Yuval Kogman Tue, 18 Oct 2005 11:39:38 -0700

JoelOnSoftware wrote an article I recently saw linked on perlmonks:

        http://www.joelonsoftware.com/articles/Wrong.html


The article discusses writing robust software, specifically by
dealing with data separation.

In my interpretation the article introduces a type system. This type
system helps write robust software, but has some limitations:

        * Type information is checked by the programmer
        * Full annotations must be supplied by the programmer
        * Lack of annotation is hard to detect

The system helps you separate data that has not been massaged for
a certain piece of code, from touching that code. The only way to
let that data reach the code is by using a filter that sanitizes it.

Joel uses 'Request("Foo")' to mean something akin to
$q->param("Foo") in CGI.pm land, and Write like 'print' (assuming an
HTML output).

His example shows how cross site scripting can arise, and how to use
the type system to avoid this problem.

The type system is implemented using coding standards: you tag
variable names, much like a tagged union. In his example, the union
type discusses data safety, and has two subtypes: safe and unsafe.

This relates very closely to tainting, but differs in one respect -
it's a static analysis. Tainting does the same thing with no user
annotation, at runtime, under very specific situation.

Perl 6 will need support for this kind of tainting, and I raised it
before, but now I would like to propose something else.

Let's look at Joel's code for a second:

        us = UsRequest("name")
        usName = us
        recordset("usName") = usName 
        sName = SFromUs(recordset("usName"))
        WriteS sName

At the top, the 'us' annotations denote that Request will return an
unsafe value, and 'us' is an unsafe value. Then 'usName' is assigned
to it (in a far away piece of code, btw). The programmer knows that
'usName' cannot be named 'sName' because it's getting it's value
from a variable that is also tagged with 'us'.

Later, the value is stored in a DB. When extracted from the DB, we
know the value is unsafe, because it is tagged as such. Then SFromUS
is like a complex casting operator, that makes something unsafe into
something safe. The naming convention is supposed to help the
programmer *see* when things go wrong.

In Perl 6 ideally this would look like this, IMHO:

        my $str = $q.param("name");
        ...
        my $name = $str;
        $storage.store("name", $name);
        ...
        my $name = $storage.get("name");
        print encode($name);

because type annotation sucks. Superficially, this code does not
have the property that both Joel and I want it to have - safety, but
I think this can be resolved.

Perl 6 has the notion of roles.

Let's say we were to decorate the param method of the http request
object, asking for a symbolic role to be attached to all the values
it returns.

What we want to get out of it is that in the scope of our code (the
lexical scope, the current class and it's subclasses, the consumers
of this module, etc etc), any retrieval of a param will tag the data
as unsafe, without param even knowing about this.

Then the view is also tagged - no data may enter the Template
namespace with this tag, or even more analy, for the scope that we
use Template, the only data we allow ourselves to put into it, is
something that is explicitly tagged as safe.

The implementation of this system is trivial with Perl 6's tools:
roles and compile time type inferrence allow the user to make a
system that gives the exact same features as Joel's system does by
wrapping interfaces.

However, what I'm more interested in is decorating existing
interfaces, in a limited scope.

The reason we want a limiting scope is that it is not our concern
how other pieces of code use $q.param safely or unsafely, with our
definition of safety or with someone else's definition of it.

What I'd like to be able to do is declare something that applies to
all code in my system (application, module, script, whatever) that
does this:

        my $str = $q.param("name");
        ...
        my $name = $str;
        $storage.store("name", $name);
        ...
        my $name = $storage.get("name");
        print encode($name);

and enables me to say that

        print $name;

is disallowed using the following rules:

        everything from $q.param is also of the type Unsafe

        everything going into $storage.store needs to get a callback
        triggered if it us unsafe (and more data about it will be stored
        in the DB).

        everything coming out of $storage.get must also trigger a
        callback, that will retag it as necessary.

        everything going into print must be of the type Safe

        the function encode has the type Unsafe -> Safe

Using these 5 rules I can then gain control over much larger bits of
code. The only question left unanswered is how do I say what code,
and what is the syntax for these decorations.

This tagging gets very interesting with his examples later on.
Here's an excert of Joel's article:

        In Excel's source code you see a lot of rw and col and when you see 
those
        you know that they refer to rows and columns. Yep, they're both 
integers,
        but it never makes sense to assign between them.

There is a real benefit to be gained here, but the usability of e.g. int
formatting functions should not be hindered by overzealous typing.

-- 
 ()  Yuval Kogman <[EMAIL PROTECTED]> 0xEBD27418  perl hacker &
 /\  kung foo master: /me has realultimatepower.net: neeyah!!!!!!!!!!!!

pgp0aSpBeHuHF.pgp
Description: PGP signature

Making Wrong Code Type Wrong

Reply via email to