On 11/5/19 1:51 PM, Dominik Csapak wrote: > add a helper to cache the ceph performance data inside pmxcfs > with broadcast_node_kv, and also a helper to read it out > > merge the data from all nodes that sent performance data > > the '$perf_cache' variable actually serves two purposes, > the writer (will be pvestatd) uses it to broadcast only its values, > and the reader (pvedaemon) uses it to cache all nodes data > > Signed-off-by: Dominik Csapak <d.csa...@proxmox.com> > --- > merging the data on read seems like a good idea, since we have the data > and it should make little sense to throw any way, but i noticed some > weird glitches when the pvestat update calls are near each other, since > the timestamps are then e.g. ..01, ..02, ..03, ..11, ..12, etc. > > this looks slightly weird on the extjs charts since you have some > clustered data point at some timestamps but not others... > > we could of course always only use the data from the current node, > this way we would have a more consistent interval > > PVE/Ceph/Services.pm | 54 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 54 insertions(+) > > diff --git a/PVE/Ceph/Services.pm b/PVE/Ceph/Services.pm > index 45eb6c3f..70cb9b9a 100644 > --- a/PVE/Ceph/Services.pm > +++ b/PVE/Ceph/Services.pm > @@ -360,4 +360,58 @@ sub destroy_mgr { > return undef; > } > > +my $perf_cache = [];
rather pass that to the method as parameter, so that the caller has it in scope, not the module.. > + > +sub cache_perf_data { > + my ($keep_seconds) = @_; > + > + $keep_seconds //= 5*60; # 5 minutes > + > + my $rados = PVE::RADOS->new(); wouldn't it make sense to cache this? Else we fork on each call of this[0], which doesn't seems to performant, fork isn't particularly cheap.. [0]: https://git.proxmox.com/?p=librados2-perl.git;a=blob;f=PVE/RADOS.pm;h=11af8a69ee1f3564538c856464908810cfa24eec;hb=HEAD#l124 > + my $status = $rados->mon_command({ prefix => "status" }); quite a bit of information, but the mgr(s) caches it anyway, and there seem no easy way to get just the PGDigest::print_summary info easily over librados2 directly, at least I found none, albeit the API docs seem to be pretty lacking.. So for now I'd say this is OK > + > + my $pgmap = $status->{pgmap}; > + > + my $time = time(); it's a bit a shame that the iostats in pgmap have no timestamp.. :/ > + my $entry = { > + time => $time, > + ops_r => $pgmap->{read_op_per_sec}, > + ops_w => $pgmap->{write_op_per_sec}, > + bytes_r => $pgmap->{read_bytes_sec}, > + bytes_w => $pgmap->{write_bytes_sec}, can all be undefined if no activity is on the ceph cluster > + }; > + > + push @$perf_cache, $entry; > + > + # remove entries older than $keep_seconds > + $perf_cache = [grep { $time - $_->{time} < $keep_seconds } @$perf_cache > ]; why not just keep N entries, and remove all those which are outside of that length restriction, something like (untested): my $cache_depth = 512; my $cache_depth = scalar @$cache; if ($cache_depth > $max_cache_depth) { splice(@$cache, 0, $cache_depth - $max_cache_depth); } probably faster and limits this on what we actually want, maximal entry count. As each entry has the time saved anyway a client can still cut-off after a certain time, if desired. > + > + my $data = encode_json($perf_cache); > + PVE::Cluster::broadcast_node_kv("ceph-perf", $data); not sure if we want this, I mean those are pve ceph-server stats and so their the same on the whole cluster.. So we're uselessly broadcasting this (nodecount - 1) times.. A lockless design to have only one sender per node would be to just let the node with the highest node-id from a quorate partition broadcast this in a cluster wide status hashtable? If we can broadcast we're quorate, and thus we can trust the membership info, and even for the short race possibility where a node with higher priority becomes quorate we or it will effectively just overwrite other, also valid values, to this hash. > +} > + > +sub get_cached_perf_data { > + > + # only get new data if the already cached one is older than 10 seconds > + if (scalar(@$perf_cache) > 0 && (time() - $perf_cache->[-1]->{time}) < > 10) { > + return $perf_cache; > + } > + > + my $raw = PVE::Cluster::get_node_kv("ceph-perf"); > + > + my $res = []; > + my $times = {}; > + > + for my $host (keys %$raw) { why multi host? Those stats are the same ceph-clusterwide, AFAICT, distributed through MgrStatMonitor PAXOS child class. E.g., I never saw different values if I executed the following command cluster wide at the same time: perl -we 'use PVE::RADOS; PVE::RPCEnvironment->setup_default_cli_env(); my $rados = PVE::RADOS->new(); my $stat = $rados->mon_command({ prefix => "status" })->{pgmap}; print "w/r: $stat->{write_bytes_sec}/$stat->{read_bytes_sec}\n";' > + my $tmp = eval { decode_json($raw->{$host}) }; > + for my $entry (@$tmp) { > + my $etime = $entry->{time}; > + push @$res, $entry if !$times->{$etime}++; > + } > + } > + > + $perf_cache = [sort { $a->{time} <=> $b->{time} } @$res]; > + return $perf_cache; > +} > + > 1; > _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel