First of all, while I use TextTest, I'm fortunate to be surrounded by TextTest experts such as Goeff and Johan here at Carmen, so I'm not a TextTest expert by any measure. I probably use it in an non- optimal way. For really good answers, I suggest using the mailing list at sourceforge: http://lists.sourceforge.net/lists/listinfo/texttest-users
For those who don't know, TextTest is a testing framework which is based on executing a program under test, and storing all output from such runs (stdout, sterr, various files written etc). On each test run, the output is compared to output which is known to be correct. When there are differences, you can inspect these with a normal diff tool such as tkdiff. In other words, there are no test scripts that perform actual and expected values, but there might be scripts for setting up, extracting results and cleaning up. Grig Gheorghiu wrote: > I've been writing TextTest tests lately for an application that will be > presented at a PyCon tutorial on "Agile development and testing". I > have to say that if your application does a lot of logging, then the > TextTest tests become very fragile in the presence of changes. So I had > to come up with this process in order for the tests to be of any use at > all: The TextTest approach is different from typical unittest work, and it takes some time to get used to. Having worked with it for almost a year, I'm fairly happy with it, and feel that it would mean much more work do achieve the same kind of confidence in the software we write if would use e.g. the unittest module. YMMV. > 1) add new code in, with no logging calls > 2) run texttest and see if anything got broken > 3) if nothing got broken, add logging calls for new code and > re-generate texttest golden images I suppose its usefulness depends partly on what your application looks like. We've traditionally built applications that produce a lot of text files which have to look in a certain way. There are several things to think about though: We use logging a lot and we have a clear strategy for what log levels to use in different cases. While debugging we can set a high log level, but during ordinary tests, we have a fairly low log level. What log messages gets emitted is controlled by an environment variable here, but you could also filter out e.g. all INFO and DEBUG messages from your test comparisions. We're not really interested in exactly what the program does when we test it. We want to make sure that it produces the right results. As you suggested, we'll get lots of false negatives if we log too much details during test runs. We filter out stuff like timstamps, machine names etc. There are also features that can be used to verify that texts occur even if they might appear in different orders etc (I haven't meddled with that myself). We don't just compare program progress logs, but make sure that the programs produce appropriate results files. I personally don't work with any GUI applications, being confined in the database dungeons here at Carmen, but I guess that if you did, you'd need to instrument your GUI so that it can dump the contents of e.g. list views to files, if you want to verify the display contents. Different output files are assigned different severities. If there are no changes, you get a green/green result in the test list. If a high severity file (say a file showing produced results) differ, you get a red/red result in the test list. If a low severity file (e.g. program progress log or stderr) changes, you get a green/red result in the test list. You can obviously use TextTest for regression testing legacy software, where you can't influence the amount of logs spewed out, but in those cases, it's probable that the appearence of the logs will be stable over time. Otherwise you have to fiddle with filters etc. In a case like you describe above, I could imagine the following situation. You're changing some code, and that change will add some new feature, and as a side effect change the process (but not the results) for some existing test. You prepare a new test, but want to make sure that existing tests don't break. For your new test, you have no previous results, so all is red, and you have to carefully inspect the files that describe the generated results. If you're a strict test-first guy, you've written a results file already and with some luck it's green, or just have some cosmetic differences that you can accept. You look at the other produced files as well and see that there are no unexpected error messages etc, but you might accept the program progress log without caring too much over every line. For your existing tests, you might have some green/red cases, where low severity files changed. If there are red/red test cases among the existing tests, you are in trouble. Your new code has messed up the results of other runs. Well, this might bee expected. It's all text files, and if it's hundreds of tests, you can write a little script to verify that the only thing that changed in these 213 result files was that a date is formatted differently or whatever. In a more normal case, a handful of tests have a few more lines (maybe repeated a number of times) in some progress logs, and it's quick to verify in the GUI that this is the only change. As you suggested, you might also raise the log level for this particular log message for the moment, rerun the tests, see that the green/red tunred to green/green, revert to log levels, rerun again, and save all the new results as correct. > I've been doing 3) pretty much for a while and I find myself > regenerating the golden images over and over again. So I figured that I > won't go very far with this tool without the discipline of going > through 1) and 2) first. If you want this it migh be convenient to write something like LOGLEVEL_NEW=1 ... log(LOGLEVEL_NEW, 'a new log message') ... log(LOGLEVEL_NEW, 'another new log message') ... and then remove the first assignment and replace all LOGLEVEL_NEW with DEBUG or whatever after the first green test. It's also worth thinking about how much logging you want during tests. Do you want to see implementation changes that don't cause changed results? You are verifying the achieved results properly in some kind of snapshot or output files, right? >>From what I see though, there's no way I can replace my unit tests with > TextTest. It's just too coarse-grained to catch subtle errors. I'm > curious to know how exactly you use it at Carmen and how you can get > rid of your unit tests by using it. I work in the database group at Carmen R&D, and a big part of our job is to provide APIs for the applications to use when working with databases. Thus, we typically test our APIs rather than full programs. This means that we need to write test programs, typically in Python, that use the API, and present appropriate results. The way we work in those cases isn't very different from typical unit testing. We have to write test scripts (since the *real* program doesn't exist) and we have to make the test scripts produce a predictable and reproducable output. This isn't different from what you do in unittest. If you can use .assertEqual() etc, you can also get output to a file which is reproducable. For us, who mainly work with content in databases, the typical thing for use to verify is that one or several tables hold a particular content after a particular set of operations. The easiest way to verify that is to dump the results of one or several DB queries to text files, and to compare the files. Typically, the files are identical, and if there is a difference, we can directly see the difference between the files with tkdiff. This is far superior that just writing typical unittest modules where we'd perform a certain query and get a message that this particular query gave the wrong result, but I don't get a chance to see any context. Tkdiff on dumps of small tables is very informative. It's also easier when the expected result changes. I can usually see fairly easy in tkdiff that the new result differs from the old just in the way I expected, and save the new result as the correct result to compare with. One has to be very careful here of course. If you're in a hurry and don't inspect everything properly, you could save an incorrect result as correct, and think that your flawed program works as intended. It might be less likely that you would write an incorrect assertion in unittest and make the program produce this incorrect result. This is a fundamental difference between unittest and the way *I* use texttest. With unittest, you state your expected results before the test, and mechanically verify that this is correct. With TextTest, you *could* do the same, and change the text file with expected results to what you now think it should be. I'm usually too lazy for that though. I just change my code, rerun the test, expect red, look at the difference in tkdiff, determine whether this is what it should look like, and save the new results as correct if I determine that it is. So far, this hasn't led to any problems for me, and it's far less work than to manually write all the assertions. TextTest (particularly in the mature setup here) also provides support for distribution of tests among hosts, on different operating systems, with different database systems etc, automatic test execution as part of nightly builds etc. TextTest originated among our optimization experts. As far as I know, they use it more as it is intended than we do, i.e. to test complete programs, rather than small scripts written just to check a feature in an API. Honestly, I don't know a lot of details there though. They seem to practice XP fairly strictly, but instead of writing unit tests, they test the entire programs in TextTest, and they obviously manage to produce world class software in an efficient way. As I said, I'm not the expert... -- http://mail.python.org/mailman/listinfo/python-list