Thanks for the feedback and apologies for the late response.

Since my posting and your responses, I began implementing the binary search
into array.c. After a couple days of coding and research, I managed to
produce this code.

    HashTable *arr_hash;
    arr_hash = Z_ARRVAL_P(array);
    int low=0;
    int high=zend_hash_num_elements(arr_hash);
    int mid;
    zval res;
    while(low<=high){
            mid=(low+high)/2;
            ZVAL_DEREF(entry);
            entry = zend_hash_index_find(arr_hash, mid);
            compare_function(&res,entry,value);
            //php_printf("%ld\n", Z_LVAL(res));
         if(Z_LVAL(res) > 0){
            high=mid-1;
         }else if(Z_LVAL(res) < 0){
            low=mid+1;
         }else{
            RETURN_TRUE;
         }
    }
    RETURN_FALSE;


After benchmarking this code, I managed to conclude that this binary search
algorithm preforms vastly quicker then the php_search_array function. Based
on these results I believe that this algorithm would be a good optimization
or feature to add to PHP. Based on the feedback I have received from you
guys, I am holding off the sort flag in the binary search function
parameters until I can determine how much performance will be compromised
from the sorting.

At the moment I am trying to figure out if my function would be considered
a minor or major chance since if it is a major/large change, I will require
a RFC in order to submit a pull request.

If anybody has any additional feedback or recommendations, feel free to
reply to this post.

On 30 March 2015 at 02:12, Nikita Popov <nikita....@gmail.com> wrote:

> On Mon, Mar 30, 2015 at 8:24 AM, Danylo Medinsky <medinsk...@gmail.com>
> wrote:
>
>> Recently, as part of school course on optimization, I have identified a
>> potential optimization or feature for PHP. After looking through the
>> array.c file(located in etc/standard in the PHP source code), I noticed
>> that both the in_array and array_search are using the linear search C
>> function php_search_array. The potential issue I see is that even though
>> the linear search algorithm does not drastically effect performance when
>> searching a relatively small array, it may slow performance down when
>> iterating through a very large array(couple of million indexes).
>>
>> To back this fact up I preformed some benchmarks using a PHP script that
>> measures the time it takes to preform a linear search and binary search.
>> The benchmark was in favor of the binary search as it found its target
>> many
>> times quicker compared to the linear search.
>>
>> After researching more about different PHP search functions, I was not
>> able
>> to find one that uses a binary search algorithm. Based on these factors I
>> believe that a binary search implementation into PHP will prove useful in
>> certain situations and would like to implement that feature.
>>
>> At the moment I am debating between adding a separate binary search based
>> function into array.c or modifying the current array_search_array function
>> to utilize a binary search algorithm. Since the requirement for a binary
>> search is that the array being searched is sorted, I thought is would be a
>> good idea to have a Boolean parameter in the binary search function to
>> specify if the input array should be sorted or not. something among the
>> lines of "function binary_search(string target, array arrayToSearch, bool
>> sort)". Doing this, the function can preform a sort if required to, thus
>> not needing to sort a array that is already sorted.
>>
>> Before I can begin the implementation and create a pull request, I would
>> like to know the PHP's community feedback on my potential contribution and
>> if this is a wise feature to change or implement.
>>
>
>
> If you need to find an element in an array you should make the lookup key
> based. I.e. you save elements as $array[$value] = true and then check
> isset($array[$value]). This operation is O(1), as opposed to O(log n) for
> binary search and O(n) for linear search.
>
> Using binary search in PHP only makes sense if the elements in the array
> cannot be used as keys - where the only reasonable use case I see are
> objects, which at least currently do not support comparison operator
> overloading, so a binary search implementation would have to accept a
> comparator function.
>
> Also the concept of adding a $sort parameter to a binsearch function
> doesn't make much sense to me. Sorting the array is more expensive than
> doing a linear search.
>
> Nikita
>
>

Reply via email to