Ian Kelly wrote: > On Tue, Sep 23, 2014 at 11:01 PM, Miki Tebeka <miki.teb...@gmail.com> > wrote: >> On Tuesday, September 23, 2014 7:33:06 PM UTC+3, Rob Gaddi wrote: >> >>> While you're at it, think >>> long and hard about that definition of fuzziness. If you can make it >>> closer to the concept of histogram "bins" you'll get much better >>> performance. >> The problem for me here is that I can't determine the number of bins in >> advance. I'd like to get frequencies. I guess every "new" (don't have any >> previous equal item) can be a bin. > > Then your result depends on the order of your input, which is usually > not a good thing. > > Why would you need to determine the *number* of bins in advance? You > just need to determine where they start and stop. If for example your > epsilon is 0.5, you could determine the bins to be at [-0.5, 0.5); > [0.5, 1.5); [1.5, 2.5); ad infinitum. Then for each actual value you > encounter, you could calculate the appropriate bin, creating it first > if it doesn't already exist.
That has the unfortunate implication that: 0.500000001 and 1.499999999 (delta = 0.999999998) are considered equal, but: 1.500000001 and 1.499999999 (delta = 0.000000002) are considered unequal. -- Steven -- https://mail.python.org/mailman/listinfo/python-list