Janto Dreijer wrote:

> On Oct 13, 6:12 pm, Peter Otten <__pete...@web.de> wrote:
>> Janto Dreijer wrote:
>> > I'm looking for code that will calculate the running median of a
>> > sequence, efficiently. (I'm trying to subtract the running median from
>> > a signal to correct for gradual drift).
>>
>> > My naive attempt (taking the median of a sliding window) is
>> > unfortunately too slow as my sliding windows are quite large (~1k) and
>> > so are my sequences (~50k). On my PC it takes about 18 seconds per
>> > sequence. 17 of those seconds is spent in sorting the sliding windows.
>>
>> > I've googled around and it looks like there are some recent journal
>> > articles on it, but no code. Any suggestions?
>>
>> If you aren't using numpy, try that. Showing your code might also be a
>> good idea...
>>
>> Peter
> 
> I placed the test code and its output here:
> http://bitbucket.org/janto/snippets/src/tip/running_median.py

That gives me something to tinker ;)
 
> I also have a version that uses numpy. On random data it seems to be
> about twice as fast as the pure python one. It spends half the time
> sorting and the other half converting the windows from python lists to
> numpy arrays.
> If the data is already sorted, the version using python's builtin sort
> outperforms the numpy convert-and-sort by about 5 times. Strangely
> satisfying :)

I was thinking of using as many of numpy's bulk operations as possible:

def running_median_numpy(seq):
        data = array(seq, dtype=float)
        result = []
        for i in xrange(1, window_size):
                window = data[:i]
                result.append(median(window))
        for i in xrange(len(data)-window_size+1):
                window = data[i:i+window_size]
                result.append(median(window))
        return result

But it didn't help as much as I had hoped.
The fastest I came up with tries hard to keep the data sorted:

def running_median_insort(seq):
        seq = iter(seq)
        d = deque()
        s = []
        result = []
        for item in islice(seq, window_size):
                d.append(item)
                insort(s, item)
                result.append(s[len(d)//2])
        m = window_size // 2
        for item in seq:
                old = d.popleft()
                d.append(item)
                del s[bisect_left(s, old)]
                insort(s, item)
                result.append(s[m])
        return result

Some numbers:

10.197 seconds for running_median_scipy_medfilt
25.043 seconds for running_median_python
13.040 seconds for running_median_python_msort
14.280 seconds for running_median_python_scipy_median
4.024 seconds for running_median_numpy
0.221 seconds for running_median_insort

What would be an acceptable performance, by the way?

Peter

PS: code not tested for correctness


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to