Thanks for the feedback and apologies for the late response. Since my posting and your responses, I began implementing the binary search into array.c. After a couple days of coding and research, I managed to produce this code.
HashTable *arr_hash; arr_hash = Z_ARRVAL_P(array); int low=0; int high=zend_hash_num_elements(arr_hash); int mid; zval res; while(low<=high){ mid=(low+high)/2; ZVAL_DEREF(entry); entry = zend_hash_index_find(arr_hash, mid); compare_function(&res,entry,value); //php_printf("%ld\n", Z_LVAL(res)); if(Z_LVAL(res) > 0){ high=mid-1; }else if(Z_LVAL(res) < 0){ low=mid+1; }else{ RETURN_TRUE; } } RETURN_FALSE; After benchmarking this code, I managed to conclude that this binary search algorithm preforms vastly quicker then the php_search_array function. Based on these results I believe that this algorithm would be a good optimization or feature to add to PHP. Based on the feedback I have received from you guys, I am holding off the sort flag in the binary search function parameters until I can determine how much performance will be compromised from the sorting. At the moment I am trying to figure out if my function would be considered a minor or major chance since if it is a major/large change, I will require a RFC in order to submit a pull request. If anybody has any additional feedback or recommendations, feel free to reply to this post. On 30 March 2015 at 02:12, Nikita Popov <nikita....@gmail.com> wrote: > On Mon, Mar 30, 2015 at 8:24 AM, Danylo Medinsky <medinsk...@gmail.com> > wrote: > >> Recently, as part of school course on optimization, I have identified a >> potential optimization or feature for PHP. After looking through the >> array.c file(located in etc/standard in the PHP source code), I noticed >> that both the in_array and array_search are using the linear search C >> function php_search_array. The potential issue I see is that even though >> the linear search algorithm does not drastically effect performance when >> searching a relatively small array, it may slow performance down when >> iterating through a very large array(couple of million indexes). >> >> To back this fact up I preformed some benchmarks using a PHP script that >> measures the time it takes to preform a linear search and binary search. >> The benchmark was in favor of the binary search as it found its target >> many >> times quicker compared to the linear search. >> >> After researching more about different PHP search functions, I was not >> able >> to find one that uses a binary search algorithm. Based on these factors I >> believe that a binary search implementation into PHP will prove useful in >> certain situations and would like to implement that feature. >> >> At the moment I am debating between adding a separate binary search based >> function into array.c or modifying the current array_search_array function >> to utilize a binary search algorithm. Since the requirement for a binary >> search is that the array being searched is sorted, I thought is would be a >> good idea to have a Boolean parameter in the binary search function to >> specify if the input array should be sorted or not. something among the >> lines of "function binary_search(string target, array arrayToSearch, bool >> sort)". Doing this, the function can preform a sort if required to, thus >> not needing to sort a array that is already sorted. >> >> Before I can begin the implementation and create a pull request, I would >> like to know the PHP's community feedback on my potential contribution and >> if this is a wise feature to change or implement. >> > > > If you need to find an element in an array you should make the lookup key > based. I.e. you save elements as $array[$value] = true and then check > isset($array[$value]). This operation is O(1), as opposed to O(log n) for > binary search and O(n) for linear search. > > Using binary search in PHP only makes sense if the elements in the array > cannot be used as keys - where the only reasonable use case I see are > objects, which at least currently do not support comparison operator > overloading, so a binary search implementation would have to accept a > comparator function. > > Also the concept of adding a $sort parameter to a binsearch function > doesn't make much sense to me. Sorting the array is more expensive than > doing a linear search. > > Nikita > >