On Sun, 16 Oct 2005 20:28:55 +0200, Christian Stapfer wrote: > Experiments > (not just in computer science) are quite > frequently botched. How do you discover > botched experiments?
Normally by comparing them to the results of other experiments, and being unable to reconcile the results. You may have heard the term "the experiment was/was not replicable". How do you discover whether your theory is correct? By comparing it to itself, or by comparing it to experiment? Of course you need some theoretical background in order to design your experiment in the first place, otherwise you have no idea what you are even looking for. And it is easy to mis-design an experiment, as your student did, so that it is actually measuring X when you think it is measuring Y. If you are about to argue against a naive view of the scientific method where the scientist generates data in a mental vacuum, don't bother, I understand that. But fundamentally, calculated Big O values of algorithms on their own are of virtually zero use in choosing between actual functions. If I tell you that algorithm X is O(f(N)), what have I told you about it? Have I told if it is unacceptably slow for some particular data size? No. Have I told you how much work it will take to implement it? No. Have I told you how fast my implementation is? No. Have I told you that it is faster than some other algorithm Y? No. The only thing I have told you is that in some rough and ready fashion, if I increase the size of my data N, the amount of work done will increase very approximately like f(N). This disconnect between what Big O *actually* means and how you are recommending we use it makes REAL PRACTICAL DIFFERENCE, and not in a good way. Here is a real problem I had to solve some time ago. Without using regular expressions, I needed to find the position of a target string in a larger string, but there were multiple targets. I needed to find the first one. I ended up using something like: (untested, and from memory) def findmany(s, *targets): minoffset = (len(s)+1, "") for t in targets: p = s.find(t) if p != -1 and p < minoffset[0]: minoffset = (p, t) return minoffset[0] You would look at that, realise that s.find() is O(N) or even O(N**2) -- I forget which -- and dismiss this as Shlemiel the Painter's algorithm which is O(N**2) or worse. http://www.joelonsoftware.com/articles/fog0000000319.html And you would be right in your theory -- but wrong in your conclusion. Because then you would do what I did, which is spend hours or days writing a "more efficient" O(N) searching function in pure Python, which ended up being 100 times slower searching for a SINGLE target string than my Shlemiel algorithm was searching for twenty targets. The speed benefit I got from pushing the character matching from Python to C was so immense that I couldn't beat it no matter how "efficient" the algorithm was. If I had bothered to actually profile my original code, I would have discovered that for any data I cared about, it worked not just acceptably fast, but blindingly fast. Why would I care that if I had a terrabyte of data to search for a million targets, it would scale badly? I was searching text strings of less than a megabyte, for five or six targets. The Big O analysis I did was completely, 100% correct, and completely, 100% useless. Not just useless in that it didn't help me, but it actually hindered me, leading me to waste a day's work needlessly looking for a "better algorithm". -- Steven. -- http://mail.python.org/mailman/listinfo/python-list