Janto Dreijer wrote: > On Oct 13, 6:12 pm, Peter Otten <__pete...@web.de> wrote: >> Janto Dreijer wrote: >> > I'm looking for code that will calculate the running median of a >> > sequence, efficiently. (I'm trying to subtract the running median from >> > a signal to correct for gradual drift). >> >> > My naive attempt (taking the median of a sliding window) is >> > unfortunately too slow as my sliding windows are quite large (~1k) and >> > so are my sequences (~50k). On my PC it takes about 18 seconds per >> > sequence. 17 of those seconds is spent in sorting the sliding windows. >> >> > I've googled around and it looks like there are some recent journal >> > articles on it, but no code. Any suggestions? >> >> If you aren't using numpy, try that. Showing your code might also be a >> good idea... >> >> Peter > > I placed the test code and its output here: > http://bitbucket.org/janto/snippets/src/tip/running_median.py
That gives me something to tinker ;) > I also have a version that uses numpy. On random data it seems to be > about twice as fast as the pure python one. It spends half the time > sorting and the other half converting the windows from python lists to > numpy arrays. > If the data is already sorted, the version using python's builtin sort > outperforms the numpy convert-and-sort by about 5 times. Strangely > satisfying :) I was thinking of using as many of numpy's bulk operations as possible: def running_median_numpy(seq): data = array(seq, dtype=float) result = [] for i in xrange(1, window_size): window = data[:i] result.append(median(window)) for i in xrange(len(data)-window_size+1): window = data[i:i+window_size] result.append(median(window)) return result But it didn't help as much as I had hoped. The fastest I came up with tries hard to keep the data sorted: def running_median_insort(seq): seq = iter(seq) d = deque() s = [] result = [] for item in islice(seq, window_size): d.append(item) insort(s, item) result.append(s[len(d)//2]) m = window_size // 2 for item in seq: old = d.popleft() d.append(item) del s[bisect_left(s, old)] insort(s, item) result.append(s[m]) return result Some numbers: 10.197 seconds for running_median_scipy_medfilt 25.043 seconds for running_median_python 13.040 seconds for running_median_python_msort 14.280 seconds for running_median_python_scipy_median 4.024 seconds for running_median_numpy 0.221 seconds for running_median_insort What would be an acceptable performance, by the way? Peter PS: code not tested for correctness -- http://mail.python.org/mailman/listinfo/python-list