On Sat, Nov 29, 2003 at 09:32:19AM -0800, Galen wrote:
: 
: I'm working on some database search ranking code. It currently 
: represents 95-98% of the time spent when doing fuzzy seaches. I have 
: tried my best to optimize my code - algorithmic shortcuts, eliminating 
: session variables, unsetting irrelevant results, etc and benchmarking 
: to find the best techniques. That's given me over a 10x improvement. 
: Unfortunately, because of the number of results it must process (up to 
: 20,000), it is still somewhat slow. I think it could use some code 
: structure/formating tweaks to eek out that last bit of performance, but 
: I don't know much about optimizing PHP code in that way. Does anybody 
: have suggestions?

You should profile your code a bit more and see how much time getting
spent in your foreach loops and your usort().  You could always rewrite
your foreach loops into for loops and manually iterate through the array
itself instead of a copy.  And you use strtolower() and trim() on every
pass through the data.  That's a lot of work that can be pre-massaged in
the database so that you don't need to do it within your loops (granted,
this doubles your needed storage space in the hopes of speeding your
fuzzy searches).

Also, I noticed that your usort() was doing a normal numeric sort.  That
being the case, why not switch the line:

        usort($search_results, "cmp");

and use:

        sort($search_results, SORT_NUMERIC);

: Here's my code:
: 
:       if ($search_results[0]["relevancy"] == "")
:       {       
:               function cmp($a, $b)
:               {
:                       if($a["relevancy"]  < $b["relevancy"])
:                       {
:                               return 1;
:                       }
:                       elseif($a["relevancy"] > $b["relevancy"])
:                       {
:                               return -1;
:                       }
:                       else
:                       {
:                               return 0;
:                       }
:               }
:               
:               $search_statements = $_SESSION["search"]["statements"];
:               
:               foreach($search_results as $key1 => $value1)
:               {
:                       $num_fields_matched = 0;
:                       $result_score = 0;
:                       $metaphone_ratio = 0;
:                       foreach($search_statements as $key => $value)
:                       {
:                               if ($value != "" AND $value1[$key] != $value)
:                               {
:                                       $value = strtolower(trim($value));
:                                       $value1[$key] = 
:                                       strtolower(trim(($value1[$key])));
:                                       $num_fields_matched++;
:                                       $value_metaphone = 
:                                       metaphone($value1[$key]);
:                                       $search_metaphone = 
:                                       metaphone($value);
:                                       $search_position = 
:                                       strpos($value1[$key], $value);
:                                       $string_count = 
:                                       substr_count($value1[$key], $value);
:                                       $levenshtein = levenshtein($value, 
:                                       $value1[$key], "0.5", 1, 1);
:                                       
:                                       if ($search_metaphone == 
:                                       $value_metaphone AND 
:                                       $value_metaphone != "")
:                                       {
:                                               $metaphone_ratio = 1;
:                                       }
:                                       elseif ($search_metaphone != 0)
:                                       {
:                                               $metaphone_ratio = 0.6 * (1 
:                                               / 
:                                               levenshtein($search_metaphone, 
$value_metaphone));
:                                       }
:                                       
:                                       $result_score = $result_score + 
:                                       ($levenshtein + (8 * $search_position)) - (2 * 
($string_count - 1)) - (1.1 
: * $metaphone_ratio * $levenshtein);
:                               }
:                               elseif ($value1[$key] == $value)
:                               {
:                                       $result_score = $result_score - 5;
:                               }
:                       }
:                       if ($num_fields_matched == 0)
:                       {
:                               $num_fields_matched = 1;
:                       }
:                       $search_results[$key1]["relevancy"] = ($result_score 
:                       * -1) / $num_fields_matched;
:                       
:                       if ($fuzzy_search == true AND 
:                       $search_results[$key1]["relevancy"] < -5)
:                       {
:                               unset($search_results[$key1]);
:                       }
:               }
:               
:               usort($search_results, "cmp");
:       }

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to