Re: Best search algorithm to find condition within a range

Ian Kelly Tue, 07 Apr 2015 13:18:35 -0700

On Tue, Apr 7, 2015 at 9:40 AM,  <jonas.thornv...@gmail.com> wrote:
> Well of course you use same principles like a binary search setting min and 
> max, closing in on the digit. In this case the searched numbers  > base^exp  
> and number< base^exp+1.
>
> But since the search is within large bases upto 32-bit space, so base 
> 4294967295 is the biggest allowed. I need to find the nearest less exp in 
> base for each (lets call them pseudo digits). But as you see there it will 
> take time to add them up. So better doing a binary search, you know min-max 
> half (iteration). You can do the same for a random oracle min max within 
> range, and if the number of tries in general over 22 i think a random oracle 
> do it better than a binary search.


Okay, so you're talking about doing a binary search but picking the
partition point randomly instead of evenly?

> It was a long time since i did this, but i do know there is a threshold where 
> searching min max with the oracle will be faster than the binary search.

I don't know where you got this idea from, but it's fiction. If you
split the space in half, then you are always reducing the amount of
work remaining by a factor of 2, and the total work is O(log n). If
you do an uneven split, then while you will sometimes make a good
partition and reduce the amount of work by more than a factor of two,
more often the target will be in the larger subspace and you will have
reduced the amount of work by less than a factor of two. In the worst
case of this, you only remove one element from the search space at a
time and end up with a linear search.

As a demonstration of this, here's a function which does a binary
search for a number in a range and returns the number of partitions it
had to perform to find the number:

>>> def binsearch(n, min, max):
...   if min == max: return 0
...   med = (max + min) // 2
...   if n <= med:
...     return 1 + binsearch(n, min, med)
...   else:
...     return 1 + binsearch(n, med+1, max)
...

And here's the average number of partitions needed to find a number in
various ranges:

>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
16.68928
>>> n = 1000000
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
19.951424
>>> n = 10000000
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
23.3222784

Now let's try a variant function that splits the range at one third
instead of evenly:

>>> def trisearch(n, min, max):
...     if min == max: return 0
...     tern = (min + min + max) // 3
...     if n <= tern:
...         return 1 + trisearch(n, min, tern)
...     else:
...         return 1 + trisearch(n, tern+1, max)
...
>>> n = 100000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
17.88387
>>> n = 1000000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
21.502276
>>> n = 10000000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
25.1157592

The ternary version invariably takes more partitions on average to
find the goal number. Now let's try a third variant that partitions
the range randomly:

>>> def randsearch(n, min, max):
...     if min == max: return 0
...     part = randrange(min, max)
...     if n <= part:
...         return 1 + randsearch(n, min, part)
...     else:
...         return 1 + randsearch(n, part+1, max)
...
>>> n = 100000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
22.17863
>>> n = 1000000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
26.78371
>>> n = 10000000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
31.3916557

As you can see, this one does a lot worse on average, and at n=1e6
it's already performing worse than either of the previous versions did
at n=1e7.  I'm not even going to try this for n=1e8 because it would
take too long to run.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Best search algorithm to find condition within a range

Reply via email to