Thanks for your answer. I still don't understand completely though. I
suppose it's me, but I've been trying to understand some of this for
quite some and somehow I can't seem to wrap my head around it.
Steven D'Aprano schreef:
On Sat, 29 Nov 2008 11:36:56 +0100, Roel Schroeven wrote:
The first thing to remember is that it is impractical for unit tests to
be exhaustive. Consider the following trivial function:
def add(a, b): # a and b ints only
return a+b+1
Clearly you're not expected to test *every imaginable* path through this
function (ignoring unit tests for error handling and bad input):
assert add(0, 0) == 1
assert add(1, 0) == 2
assert add(2, 0) == 3
assert add(3, 0) == 4
...
assert add(99736263, 8264891001) = 8364627265
...
OK
> ...
I arbitrarily choose path A alone, confident that paths B C and D are
correct, but of course I could make other choices. There's no need to
test paths B C and D *within spam's unit tests*, because they are already
tested elsewhere.
Except that I'm always told that the goal of unit tests, at least
partly, is to protect us agains mistakes when we make changes to the
tested functions. They should tell me wether I can still trust spam()
after refactoring it. Doesn't that mean that the unit test should see
spam() as a black box, providing a certain (but probably not 100%)
guarantee that the unit test is still a good test even if I change the
implementation of spam()?
And I don't understand how that works in test-driven development; I
can't possibly adapt the tests to the code paths in my code, because the
code doesn't exist yet when I write the test.
> To test them again within spam doesn't gain me anything.
I would think it gains you the freedom of changing spam's implementation
while still being able to rely on the unit tests. Or maybe I'm thinking
too far?
The success of this tactic assumes that you can identify code paths and
make them independent. If they are dependent, then you can't be sure that
path E G after A is the same as E G after D.
Real world example: compare driving your car from home to the mall to the
park, compared to driving from work to the mall to the park. The journey
from the mall to the park is the same, no matter how you got to the mall.
If you can drive from home to the mall and then to the park, and you can
drive from work to the mall, then you can be sure that you can drive from
work to the mall to the park even though you've never done it before.
But if you can't be sure the paths are independent, then you can't make
that simplifying assumption, and you do have to test more paths in more
places.
OK, but that only works if I know the code paths, meaning I've already
written the code. Wasn't the whole point of TDD that you write the tests
before the code?
A related matter (at least in my mind) is this: after I've written
test_spam() but before spam() is correctly working, I find out that I
need to write spam_ham() and spam_eggs(), so I need test_spam_ham() and
test_spam_eggs(). That means that I can never have a green light while
coding test_spam_ham() and test_stam_eggs(), since test_spam() will
fail. That feels wrong.
I would say that means you're letting your tests get too far ahead of
your code. In theory, you should never have more than one failing test at
a time: the last test you just wrote. If you have to refactor code so
much that a bunch of tests start failing, then you need to take those
tests out, and re-introduce them one at a time.
I still fail to see how that works. I know I must be wrong since so many
people successfully apply TDD, but I don't see what I'm missing.
Let's take a more-or-less realistic example: I want/need a function to
calculate the least common multiple of two numbers. First I write some
tests:
assert(lcm(1, 1) == 1)
assert(lcm(2, 5) == 10)
assert(lcm(2, 4) == 4)
Then I start to write the lcm() function. I do some research and I find
out that I can calculate the lcm from the gcd, so I write:
def lcm(a, b):
return a / gcd(a, b) * b
But gcd() doesn't exist yet, so I write some tests for gcd(a, b) and
start writing the gcd function. But all the time while writing that, the
lcm tests will fail.
I don't see how I can avoid that, unless I create gcd() before I create
lcm(), but that only works if I know that I'm going to need it. In a
simple case like this I could know, but in many cases I don't know it
beforehand.
--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov
Roel Schroeven
--
http://mail.python.org/mailman/listinfo/python-list