Hi!
The question of type checking/enforcing has bothered me for a while, and
since this newsgroup has a wealth of competence subscribed to it, I
figured this would be a great way of learning from the experts. I feel
there's a tradeoff between clear, easily readdable and extensible code
on one side, and safe code providing early errors and useful tracebacks
on the other. I want both! How do you guys do it? What's the pythonic
way? Are there any docs that I should read? All pointers and opinions
are appreciated!
I've also whipped up some examples in order to put the above questions
in context and for your amusement. :-)
Briefly:
class MyClass(object):
def __init__(self, int_member = 0):
self.int_member = int_member
def process_data(self, data):
self.int_member += data
The attached files are elaborations on this theme, with increasing
security and, alas, rigidity and bloat. Even though
maximum_security_module.py probably will be the safest to use, the
coding style will bloat the code something awful and will probably make
maintenance harder (please prove me wrong!). Where should I draw the line?
These are the attached modules:
* nocheck_module.py:
As the above example, but with docs. No type checking.
* property_module.py
Type checking of data members using properties.
* methodcheck_module.py
Type checking of args within methods.
* decorator_module.py
Type checking of args using method decorators.
* maximum_security_module.py
Decorator and property type checking.
Let's pretend I'm writing a script, I import one of the above modules
and then execute the following code
...
my_object = MyClass(data1)
my_object.process_data(data2)
and then let's pretend dataX is of a bad type, say for example str.
nocheck_module.py
=================
Now, if data2 is bad, we get a suboptimal traceback (possibly to
somewhere deep within the code, and probably with an unrelated error
message). However, the first point of failure will in fact be included
in the traceback, so this error should be possible to find with little
effort. On the other hand, if data1 is bad, the exception will be raised
somewhere past the point of first failure. The traceback will be
completely off, and the error message will still be bad. Even worse: if
both are bad, we won't even get an exception. We will trundle on with
corrupted data and take no notice. Very clear code, though. Easily
extensible.
property_module.py
==================
Here we catch that data1 failure. Tracebacks may still be inconcise with
uninformative error messages, however they will not be as bad as in
nocheck_module.py. Bloat. +7 or more lines of boilerplate code for each
additional data member. Quite clear code. Readily extensible.
methodcheck_module.py
=====================
Good, concise tracebacks with exact error messages. Lots of bloat and
obscured code. Misses errors where data members are changed directly.
Very hard to read and extend.
decorator_module.py
===================
Good, concise tracebacks with good error messages. Some bloat. Misses
errors where data members are changed directly. Clear, but somewhat hard
to extend. Decorators for *all* methods?! This cannot be the purpose of
python!?
maximum_security_method.py
==========================
Good, concise tracebacks with good error messages. No errors missed (I
think? :-) . Bloat. Lots of decorators and boilerplate property code all
over the place (thankfully not within functional code, though). Is this
how it's supposed to be done?
And if you've read all the way down here I thank you so very much for
your patience and perseverance. Now I'd like to hear your thoughts on
this! Where should the line be drawn? Should I just typecheck data from
unreliable sources (users/other applications) and stick with the
barebone strategy, or should I go all the way? Did I miss something
obvious? Should I read some docs? (Which?) Are there performance issues
to consider?
Thanks again for taking the time.
Cheers!
/Joel Hedlund
"""Example module without method argument type checking.
Pros:
Pinpointed tracebacks with very exact error messages.
Cons:
Lots of boilerplate typechecking code littered all over the place,
obscuring functionality at the start of every function.
Bloat will accumulate rapidly. +2 lines of boilerplate code per method and
argument.
If I at some point decide that floats are also ok, I'll need to crawl all
over the code with a magnifying glass and a pair of tweezers.
We don't catch errors of the type
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)
"""
class MyClass(object):
"""My example class."""
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.
IN:
int_member = 0: <int>
Set the value for the data member. Must be int.
"""
# Boilerplate typechecking code.
if not isinstance(int_member, int):
raise TypeError("int_member must be int")
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member
def process_data(self, data):
"""Do some data processing.
IN:
data: <int>
New information that should be incorporated. Must be int.
"""
# Boilerplate typechecking code.
if not isinstance(data, int):
raise TypeError("data must be int")
# Data processing starts here. May for example contain addition:
self.int_member += data
# Test code. Decomment to play. :-)
#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module without type checking.
Pros:
Clean, easily readable and extensible code that gets down to business
fast. If I at some point decide that floats are also ok, I only need to
update the docs and all is well.
No bloat.
Cons:
Type restrictions are not enforced. This means that if type errors occur,
the exception may be raised far from the point of first failure, and
possibly with long, inconcise tracebacks with uninformative error messages.
"""
class MyClass(object):
"""My example class."""
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.
IN:
int_member = 0: <int>
Set the value for the data member. Must be int.
"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member
def process_data(self, data):
"""Do some data processing.
IN:
data: <int>
New information that should be incorporated. Must be int.
"""
# Data processing starts here. May for example contain addition:
self.int_member += data
# Test code. Decomment to play. :-)
#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module using properties for data member type checking.
Pros:
Quite clean, readable and extensible code that gets down to business fast.
Data member type restrictions are enforced. If I at some point decide that
floats are also ok, I only need to update the docs and a few more lines.
Cons:
Method argument types are not enforced, which means that tracebacks may
still be inconcise with uninformative error messages. Not as bad as in
nocheck_module.py though.
Bloat. +7 or more lines of boilerplate code for each added data member (can
this be done neater?). But at least the bloat is outside functional code.
"""
class MyClass(object):
"""My example class."""
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.
IN:
int_member = 0: <int>
Set the value for the data member. Must be int.
"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member
def _get_int_member(self):
return self.__int_member
def _set_int_member(self, value):
if not isinstance(value, int):
raise TypeError("int_member must be type int")
self.__int_member = value
int_member = property(_get_int_member, _set_int_member)
del _get_int_member, _set_int_member
def process_data(self, data):
"""Do some data processing.
IN:
data: <int>
New information that should be incorporated. Must be int.
"""
# Data processing starts here. May for example contain addition:
self.int_member += data
# Test code. Decomment to play. :-)
#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module without type checking.
Pros:
Clean, easily readable and extensible code that gets down to business
fast.
Pinpointed tracebacks with good error messages.
If I at some point decide that floats are also ok, I only need to
update the docs and change the decorators to
@method_argtypes((int, float)).
Cons:
With many args and allowed types, the type definitions on the decorator
lines will be hard to correlate to the args that they refer to (probably
not impossible to workaround though...?).
We still don't catch errors of the type
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)
A decorator for each method everywhere? That can't be the purpose of
python!? There has to be a better way?!
"""
def method_argtypes(*typedefs):
"""Rudimentary typechecker decorator generator.
If you're really interested in this stuff, go check out Michele
Simionato's decorator module instead. It rocks. Google is your friend.
IN:
*typedefs: <type> or <tuple <type>>
The allowed types for each arg to the method, self excluded.
Will be used with isinstance(), so valid typedefs include
int or (int, float).
"""
def argchecker(fcn):
import inspect
names = inspect.getargspec(fcn)[0][1:]
def check_args(*args):
for arg, value, allowed_types in zip(names, args[1:], typedefs):
if not isinstance(value, allowed_types):
one_of = ''
if hasattr(allowed_types, '__len__'):
one_of = "one of "
msg = ".%s() argument %r must be %s%s"
msg %= fcn.__name__, arg, one_of, allowed_types
raise TypeError(msg)
return fcn(*args)
return check_args
return argchecker
class MyClass(object):
"""My example class."""
@method_argtypes(int)
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.
IN:
int_member = 0: <int>
Set the value for the data member. Must be int.
"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member
@method_argtypes(int)
def process_data(self, data):
"""Do some data processing.
IN:
data: <int>
New information that should be incorporated. Must be int.
"""
# Data processing starts here. May for example contain addition:
self.int_member += data
# Test code. Decomment to play. :-)
#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
"""Example module without type checking.
Pros:
Clean, easily readable and extensible code that gets down to business
fast.
Pinpointed tracebacks with good error messages.
Now we catch errors of the type
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)
Cons:
With many args and allowed types, the type definitions on the decorator
lines will be hard to correlate to the args that they refer to (probably
not impossible to workaround though...?).
A decorator for each method everywhere? That can't be the purpose of
python!? There has to be a better way?!
Property bloat. +7 or more lines of boilerplate code for each added data
member (can this be done neater?).
If I at some point decide that floats are also ok, I only need to
update the docs, decorators and properties... hmm...
"""
def method_argtypes(*typedefs):
"""Rudimentary typechecker decorator generator.
If you're really interested in this stuff, go check out Michele
Simionato's decorator module instead. It rocks. Google is your friend.
IN:
*typedefs: <type> or <tuple <type>>
The allowed types for each arg to the method, self excluded.
Will be used with isinstance(), so valid typedefs include
int or (int, float).
"""
def argchecker(fcn):
import inspect
names = inspect.getargspec(fcn)[0][1:]
def check_args(*args):
for arg, value, allowed_types in zip(names, args[1:], typedefs):
if not isinstance(value, allowed_types):
one_of = ''
if hasattr(allowed_types, '__len__'):
one_of = "one of "
msg = ".%s() argument %r must be %s%s"
msg %= fcn.__name__, arg, one_of, allowed_types
raise TypeError(msg)
return fcn(*args)
return check_args
return argchecker
class MyClass(object):
"""My example class."""
@method_argtypes(int)
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.
IN:
int_member = 0: <int>
Set the value for the data member. Must be int.
"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member
def _get_int_member(self):
return self.__int_member
def _set_int_member(self, value):
if not isinstance(value, int):
raise TypeError("int_member must be type int")
self.__int_member = value
int_member = property(_get_int_member, _set_int_member)
del _get_int_member, _set_int_member
@method_argtypes(int)
def process_data(self, data):
"""Do some data processing.
IN:
data: <int>
New information that should be incorporated. Must be int.
"""
# Data processing starts here. May for example contain addition:
self.int_member += data
# Test code. Decomment to play. :-)
#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)
--
http://mail.python.org/mailman/listinfo/python-list