On Tue, 13 Oct 2009 12:59:47 -0700, Paul Rubin wrote: > sturlamolden <sturlamol...@yahoo.no> writes: >> > The obvious way to compute a running median involves a tree structure >> > so you can quickly insert and delete elements, and find the median. >> > That would be asymtotically O(n log n) but messy to implement. >> >> QuickSelect will find the median in O(log n) time. > > That makes no sense, you can't even examine the input in O(log n) time.
If that were a relevant argument, no algorithms would ever be described as running in O(log n). Obviously to run in O(log n) you must have already built the tree. If you include the time required to build the tree, then O(n) is the lower-bound (because you have to see each element to put it in the tree) but that applies to any data structure. The point is, once you have that tree, you can perform repeated calculations in O(log n) time (or so it has been claimed). For example, if you want the m running medians of m data points, you might do the following: find the median of element 1 find the median of element 1 and 2 find the median of element 1 through 3 find the median of element 1 through 4 ... find the median of element 1 through m Each median calculation may take O(log n), where n varies from 1 to m, giving a total order of: log 1 + log 2 + log 3 + ... + log m => O(log m!) steps, which is much better than: 1 + 2 + 3 + ... + m = 1/2*m*(m+1) => O(m**2) > Anyway, the problem isn't to find the median of n elements. It's to > find n different medians of n different sets. You might be able to get > close to linear time though, depending on how the data acts. But since the data sets aren't independent, a tree structure seems like the way to go to me. Inserting and deleting elements should be fast, and assuming the claim of calculating the median in O(log n) is correct, that's quite fast too. -- Steven -- http://mail.python.org/mailman/listinfo/python-list