Hi!

The question of type checking/enforcing has bothered me for a while, and since this newsgroup has a wealth of competence subscribed to it, I figured this would be a great way of learning from the experts. I feel there's a tradeoff between clear, easily readdable and extensible code on one side, and safe code providing early errors and useful tracebacks on the other. I want both! How do you guys do it? What's the pythonic way? Are there any docs that I should read? All pointers and opinions are appreciated!
I've also whipped up some examples in order to put the above questions 
in context and for your amusement. :-)
Briefly:

class MyClass(object):
    def __init__(self, int_member = 0):
        self.int_member = int_member
    def process_data(self, data):
        self.int_member += data

The attached files are elaborations on this theme, with increasing security and, alas, rigidity and bloat. Even though maximum_security_module.py probably will be the safest to use, the coding style will bloat the code something awful and will probably make maintenance harder (please prove me wrong!). Where should I draw the line?
These are the attached modules:

* nocheck_module.py:
  As the above example, but with docs. No type checking.

* property_module.py
  Type checking of data members using properties.

* methodcheck_module.py
  Type checking of args within methods.

* decorator_module.py
  Type checking of args using method decorators.

* maximum_security_module.py
  Decorator and property type checking.

Let's pretend I'm writing a script, I import one of the above modules and then execute the following code
...
my_object = MyClass(data1)
my_object.process_data(data2)

and then let's pretend dataX is of a bad type, say for example str.

nocheck_module.py
=================
Now, if data2 is bad, we get a suboptimal traceback (possibly to somewhere deep within the code, and probably with an unrelated error message). However, the first point of failure will in fact be included in the traceback, so this error should be possible to find with little effort. On the other hand, if data1 is bad, the exception will be raised somewhere past the point of first failure. The traceback will be completely off, and the error message will still be bad. Even worse: if both are bad, we won't even get an exception. We will trundle on with corrupted data and take no notice. Very clear code, though. Easily extensible.
property_module.py
==================
Here we catch that data1 failure. Tracebacks may still be inconcise with uninformative error messages, however they will not be as bad as in nocheck_module.py. Bloat. +7 or more lines of boilerplate code for each additional data member. Quite clear code. Readily extensible.
methodcheck_module.py
=====================
Good, concise tracebacks with exact error messages. Lots of bloat and obscured code. Misses errors where data members are changed directly. Very hard to read and extend.
decorator_module.py
===================
Good, concise tracebacks with good error messages. Some bloat. Misses errors where data members are changed directly. Clear, but somewhat hard to extend. Decorators for *all* methods?! This cannot be the purpose of python!?
maximum_security_method.py
==========================
Good, concise tracebacks with good error messages. No errors missed (I think? :-) . Bloat. Lots of decorators and boilerplate property code all over the place (thankfully not within functional code, though). Is this how it's supposed to be done?

And if you've read all the way down here I thank you so very much for your patience and perseverance. Now I'd like to hear your thoughts on this! Where should the line be drawn? Should I just typecheck data from unreliable sources (users/other applications) and stick with the barebone strategy, or should I go all the way? Did I miss something obvious? Should I read some docs? (Which?) Are there performance issues to consider?
Thanks again for taking the time.

Cheers!
/Joel Hedlund
"""Example module without method argument type checking.

Pros:
Pinpointed tracebacks with very exact error messages.

Cons:
Lots of boilerplate typechecking code littered all over the place, 
obscuring functionality at the start of every function. 
Bloat will accumulate rapidly. +2 lines of boilerplate code per method and
argument.
If I at some point decide that floats are also ok, I'll need to crawl all 
over the code with a magnifying glass and a pair of tweezers. 
We don't catch errors of the type 
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)

"""

class MyClass(object):
    """My example class."""
    def __init__(self, int_member = 0):
        """Instantiate a new MyClass object.
        
        IN:
        int_member = 0: <int>
            Set the value for the data member. Must be int.
             
        """
        # Boilerplate typechecking code.
        if not isinstance(int_member, int):
            raise TypeError("int_member must be int")
        # Initiallization starts here. May for example contain assignment.
        self.int_member = int_member

    def process_data(self, data):
        """Do some data processing.
        
        IN:
        data: <int>
            New information that should be incorporated. Must be int.
        
        """
        # Boilerplate typechecking code.
        if not isinstance(data, int):
            raise TypeError("data must be int")
        # Data processing starts here. May for example contain addition:
        self.int_member += data

# Test code. Decomment to play. :-)

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module without type checking.

Pros:
Clean, easily readable and extensible code that gets down to business 
fast. If I at some point decide that floats are also ok, I only need to 
update the docs and all is well.
No bloat.

Cons:
Type restrictions are not enforced. This means that if type errors occur,
the exception may be raised far from the point of first failure, and 
possibly with long, inconcise tracebacks with uninformative error messages.

"""

class MyClass(object):
    """My example class."""
    def __init__(self, int_member = 0):
        """Instantiate a new MyClass object.
        
        IN:
        int_member = 0: <int>
            Set the value for the data member. Must be int.
             
        """
        # Initiallization starts here. May for example contain assignment.
        self.int_member = int_member

    def process_data(self, data):
        """Do some data processing.
        
        IN:
        data: <int>
            New information that should be incorporated. Must be int.
        
        """
        # Data processing starts here. May for example contain addition:
        self.int_member += data

# Test code. Decomment to play. :-)

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module using properties for data member type checking.

Pros:
Quite clean, readable and extensible code that gets down to business fast.
Data member type restrictions are enforced. If I at some point decide that
floats are also ok, I only need to update the docs and a few more lines.

Cons:
Method argument types are not enforced, which means that tracebacks may 
still be inconcise with uninformative error messages. Not as bad as in 
nocheck_module.py though. 
Bloat. +7 or more lines of boilerplate code for each added data member (can
this be done neater?). But at least the bloat is outside functional code.

"""

class MyClass(object):
    """My example class."""
    def __init__(self, int_member = 0):
        """Instantiate a new MyClass object.
        
        IN:
        int_member = 0: <int>
            Set the value for the data member. Must be int.
             
        """
        # Initiallization starts here. May for example contain assignment.
        self.int_member = int_member

    def _get_int_member(self):
        return self.__int_member
    def _set_int_member(self, value):
        if not isinstance(value, int):
            raise TypeError("int_member must be type int")
        self.__int_member = value
    int_member = property(_get_int_member, _set_int_member)
    del _get_int_member, _set_int_member
    
    def process_data(self, data):
        """Do some data processing.
        
        IN:
        data: <int>
            New information that should be incorporated. Must be int.
        
        """
        # Data processing starts here. May for example contain addition:
        self.int_member += data

# Test code. Decomment to play. :-)

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module without type checking.

Pros:
Clean, easily readable and extensible code that gets down to business 
fast. 
Pinpointed tracebacks with good error messages.
If I at some point decide that floats are also ok, I only need to 
update the docs and change the decorators to 
@method_argtypes((int, float)).

Cons:
With many args and allowed types, the type definitions on the decorator 
lines will be hard to correlate to the args that they refer to (probably 
not impossible to workaround though...?). 
We still don't catch errors of the type 
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)
A decorator for each method everywhere? That can't be the purpose of 
python!? There has to be a better way?!

"""

def method_argtypes(*typedefs):
    """Rudimentary typechecker decorator generator.
    
    If you're really interested in this stuff, go check out Michele 
    Simionato's decorator module instead. It rocks. Google is your friend.

    IN:
    *typedefs: <type> or <tuple <type>>
        The allowed types for each arg to the method, self excluded.
        Will be used with isinstance(), so valid typedefs include
        int or (int, float).
    
    """
    def argchecker(fcn):
        import inspect
        names = inspect.getargspec(fcn)[0][1:]
        def check_args(*args):
            for arg, value, allowed_types in zip(names, args[1:], typedefs):
                if not isinstance(value, allowed_types):
                    one_of = ''
                    if hasattr(allowed_types, '__len__'):
                        one_of = "one of "
                    msg = ".%s() argument %r must be %s%s"
                    msg %= fcn.__name__, arg, one_of, allowed_types
                    raise TypeError(msg)
            return fcn(*args)
        return check_args
    return argchecker

class MyClass(object):
    """My example class."""
    @method_argtypes(int)
    def __init__(self, int_member = 0):
        """Instantiate a new MyClass object.
        
        IN:
        int_member = 0: <int>
            Set the value for the data member. Must be int.
             
        """
        # Initiallization starts here. May for example contain assignment.
        self.int_member = int_member

    @method_argtypes(int)
    def process_data(self, data):
        """Do some data processing.
        
        IN:
        data: <int>
            New information that should be incorporated. Must be int.
        
        """
        # Data processing starts here. May for example contain addition:
        self.int_member += data

# Test code. Decomment to play. :-)

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module without type checking.

Pros:
Clean, easily readable and extensible code that gets down to business 
fast. 
Pinpointed tracebacks with good error messages.
Now we catch errors of the type 
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)

Cons:
With many args and allowed types, the type definitions on the decorator 
lines will be hard to correlate to the args that they refer to (probably 
not impossible to workaround though...?). 
A decorator for each method everywhere? That can't be the purpose of 
python!? There has to be a better way?!
Property bloat. +7 or more lines of boilerplate code for each added data 
member (can this be done neater?). 
If I at some point decide that floats are also ok, I only need to 
update the docs, decorators and properties... hmm...

"""

def method_argtypes(*typedefs):
    """Rudimentary typechecker decorator generator.
    
    If you're really interested in this stuff, go check out Michele 
    Simionato's decorator module instead. It rocks. Google is your friend.

    IN:
    *typedefs: <type> or <tuple <type>>
        The allowed types for each arg to the method, self excluded.
        Will be used with isinstance(), so valid typedefs include
        int or (int, float).
    
    """
    def argchecker(fcn):
        import inspect
        names = inspect.getargspec(fcn)[0][1:]
        def check_args(*args):
            for arg, value, allowed_types in zip(names, args[1:], typedefs):
                if not isinstance(value, allowed_types):
                    one_of = ''
                    if hasattr(allowed_types, '__len__'):
                        one_of = "one of "
                    msg = ".%s() argument %r must be %s%s"
                    msg %= fcn.__name__, arg, one_of, allowed_types
                    raise TypeError(msg)
            return fcn(*args)
        return check_args
    return argchecker

class MyClass(object):
    """My example class."""
    @method_argtypes(int)
    def __init__(self, int_member = 0):
        """Instantiate a new MyClass object.
        
        IN:
        int_member = 0: <int>
            Set the value for the data member. Must be int.
             
        """
        # Initiallization starts here. May for example contain assignment.
        self.int_member = int_member

    def _get_int_member(self):
        return self.__int_member
    def _set_int_member(self, value):
        if not isinstance(value, int):
            raise TypeError("int_member must be type int")
        self.__int_member = value
    int_member = property(_get_int_member, _set_int_member)
    del _get_int_member, _set_int_member

    @method_argtypes(int)
    def process_data(self, data):
        """Do some data processing.
        
        IN:
        data: <int>
            New information that should be incorporated. Must be int.
        
        """
        # Data processing starts here. May for example contain addition:
        self.int_member += data

# Test code. Decomment to play. :-)

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to