Re: Searching hash if a given value exists

Rob Dixon Fri, 19 Jan 2007 19:14:57 -0800

Michael Alipio wrote:
> Hi,
>
>> ----- Original Message ---- From: Igor Sutton <[EMAIL PROTECTED]> To:
>> Mumia W. <[EMAIL PROTECTED]> Cc: Beginners List
>> <beginners@perl.org> Sent: Friday, January 19, 2007 4:42:09 PM Subject: Re:
>> Searching hash if a given value exists
>
>> [...]
>
>> New benchmarks about the subject:
>
>> foreach_hash_keys:  4 wallclock secs ( 4.40 usr +  0.00 sys =  4.40 CPU) @
>> 227272.73/s (n=1000000) foreach_hash_values:  4 wallclock secs ( 3.46 usr +
>> 0.01 sys =  3.47 CPU) @ 288184.44/s (n=1000000) reverse_hash:  6 wallclock
>> secs ( 6.85 usr +  0.01 sys =  6.86 CPU) @ 145772.59/s (n=1000000)
> Rate  reverse_hash foreach_hash_keys foreach_hash_values
>>      reverse_hash        145773/s            --              -36%
>> -49% foreach_hash_keys   227273/s           56%                --
>> -21% foreach_hash_values 288184/s           98%               27%
>> --
>
>
> Cool! so the winner is foreach_hash_values??


No. I thought this was going to mislead somebody.

foreach (values %hash) {
  return 1 if $_ eq 'house';
}

is quicker than

foreach (keys %hash) {
  return 1 if $hash{$_} eq 'house';
}

as you would expect, I hope. But using a reverse hash is only slower if you
rebuild it in its entirety every time you need to do a lookup. If you use it
properly it is many many times faster than the very blunt linear search tool.
Look at these benchmarks:

use strict;
use warnings;

use Benchmark qw/cmpthese/;

my %hash = ( dog => 'house', pig => 'barn', bird => 'cage' );
my %rhash = reverse %hash;

cmpthese(5_000_000,
  {
   'reverse hash' => sub {
      return exists $rhash{house};
    },

   'foreach hash keys' => sub {
      foreach (keys %hash) {
        return 1 if $hash{$_} eq 'house';
      }
      return '';
    },

   'foreach hash values' => sub {
      foreach (values %hash) {
        return 1 if $_ eq 'house';
      }
      return '';
    },
  }
);

**OUTPUT**

                         Rate foreach hash keys foreach hash values reverse hash
foreach hash keys    503931/s                --                -51%         -86%
foreach hash values 1022495/s              103%                  --         -71%
reverse hash        3516174/s              598%                244%           --

So looping over the values is about twice as fast as looping over the keys and
extracting the corresponding values, but accessing a prebuilt reversed hash is
about seven times faster. And this is for an unrepresentatively tiny set of
data; if there was some real content in the sample the reverse hash would be
streets ahead.

In summary, my rule is to start by writing a program so that it does things in
the most obvious way so as to minimise the risk of errors and bugs. Once it is
working, if it also runs quickly enough then the job is done; great. If not,
then we must start to optimise the nice clean code and make it dirtier but
faster. I think it is clear which of the above is the most transparent solution
to checking whether a hash contains a particular value, and in practise it is
actually going to be the fastest, so lucky us.

> Never knew I could do a foreach (values %hash), I only use "foreach keys" for
> a very long time now. I tried searching the documentation for "foreach" but no
> avail.

Take a look at

perldoc -f values

and, while you're at it,

perldoc -f each

HTH,

Rob


Oh, and a PS. If you don't need the full reversal of a hash, but just a
derivative hash containing all the first hash's values as keys, then the hash
slice I put in my original solution

my %rhash;
@rhash{values %hash} = ();

is actually faster than

my %rhash = reverse %hash;

as it doesn't have to duplicate all the original hash keys as well: they are
irrelevant.

R


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Searching hash if a given value exists

Reply via email to