Re: Using filepath method to identify an .html page

Dave Angel Tue, 22 Jan 2013 11:04:11 -0800

On 01/22/2013 01:26 PM, Ferrous Cranus wrote:

<snip>


sub hashit {
    my $url=shift;
    my @ltrs=split(//,$url);
    my $hash = 0;

    foreach my $ltr(@ltrs){
         $hash = ( $hash + ord($ltr)) %10000;
    }
    printf "%s: %0.4d\n",$url,$hash

}


which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547

If you use that algorithm to get a 4 digit number, it'll look good forthe first few files. But if you try 100 files, you've got almost 40%chance of a collision, and if you try 10001, you've got a 100% chance.



So is it really okay to reuse the same integer for different files?

I tried to help you when you were using the md5 algorithm. By usingenough digits/characters, you can cut the likelihood of a collisionquite small. But 4 digits, don't be ridiculous.



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Re: Using filepath method to identify an .html page

Reply via email to