Ok, I've partially scratched my own itch by adapting the files_hashes.c
Hash() function and two associated C #defines from cf3.defs.h into a
small Perl script. Given a list of hostnames on standard input, the
script below will print out lines of the form "hashvalue = hostname1,
hostname2, ..." for each possible hash value. The following 5 lines of x
and y values, meaning "x hash table entries contain y hostnames", are
what I get from our list of hostnames:
1430 1
292 2
49 3
6 4
1 5
I'll play with the script more to glean more information, including
actual splaytimes for the hash values.
Justin
--- cut here ---
#!/usr/bin/perl
use strict;
use warnings;
our $CF_MACROALPHABET = 61;
our $CF_HASHTABLESIZE = 4969;
our %hashtable = ();
sub hash_hostname {
my ($hostname) = @_;
my $slot = 0;
for my $ord (map { ord } split //, $hostname) {
$slot = ($slot * $CF_MACROALPHABET + $ord) % $CF_HASHTABLESIZE;
}
$hashtable{$slot} = [] unless $hashtable{$slot};
push @{$hashtable{$slot}}, $hostname;
return $slot;
}
while (<>) {
chomp;
hash_hostname($_);
}
$" = ', ';
print "$_ = @{$hashtable{$_}}\n" foreach sort { $a <=> $b } keys
%hashtable;
--- cut here ---
-----Original Message-----
From: Justin Lloyd
Sent: Thursday, February 25, 2010 9:51 AM
To: Help-cfengine
Subject: Splaytime and hostname hashing
The documentation states that splaytime specifies an integer number of
minutes by which cf-execd's execution of cf-agent should be delayed. It
states the delay is between 0 and splaytime minutes, which to nitpick
should really read "0 to (splaytime * 60) seconds", since otherwise it
could be misinterpreted as meaning the delay would be an integer number
of minutes.
Further, it mentions, basically in passing, that the specific delay for
a given host is based on the hash of that hostname (the algorithm for
which is in files_hashes.c, FWIW). Unless there's a way to do this that
I'm not seeing, I would be interested in being able to access and report
on hosts' hash values to see how well distributed they are. In other
words, if I have 1000 systems, I'd like to know the distribution of
their splaytimes to know how many systems will access policy servers,
etc. during an execution. This would be useful in tuning both splaytime
and the schedule variable in the executor control body, especially as an
environment grows.
Has anyone done anything along these lines with existing information, or
is this something I should put in a ticket for?
Thanks,
Justin
--
Justin C. Lloyd
Unix Infrastructure Engineer
DigitalGlobe, An Imaging and Information Company
This electronic communication and any attachments may contain confidential and
proprietary
information of DigitalGlobe, Inc. If you are not the intended recipient, or an
agent or employee
responsible for delivering this communication to the intended recipient, or if
you have received
this communication in error, please do not print, copy, retransmit, disseminate
or
otherwise use the information. Please indicate to the sender that you have
received this
communication in error, and delete the copy you received. DigitalGlobe reserves
the
right to monitor any electronic communication sent or received by its
employees, agents
or representatives.
_______________________________________________
Help-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/help-cfengine