On 09/21/2016 10:51 AM, Alexandre DERUMIER wrote:
Note that I have around 1000vms, so I don't known impact of number of
messages/s.
a simple tcpdump give me an average of:
udp/5404: 500packets/s
udp/5405 : 1300 packets/s
----- Mail original -----
De: "Alexandre Derumier" <aderum...@odiso.com>
À: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 21 Septembre 2016 09:57:42
Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes),
get rid of corosync ?
@Alexandre, you say that with 16 nodes the cluster is quite at is maximum,
can I get some more infos from you as I currently do not have the
hardware to
test this :)
Do you use IGMP snooping/queriers?
On which network communicates corosync, on an independent? And how fast
is it?
Redundant rings also?
I have a full 2x10gb network through lacp (no Redundant ring).
Dedicated vlan for nodes, but sharing same physical links (but far to be
saturated)
Cluster node are 2x10cores 3,1ghz xeon, with ssd for local storage
currently mtu 1500, but I'm planning to increase it to 9000, as it seem that
allow more messages.
I'm using igmp snooping/queriers (multicast stable).
OK, multicast traffic may still be hindered when on the same network with
heavy users (e.g. VM storage), even if the network itself is not saturated.
A second totem ring through the redundant ring protocol (rrp) in passive
mode could boost the performance as it almost doubles the speed of the totem
protocol, plus it adds redundancy for quorum.
and I'm seeing a lot of retransmit, time to time (around 5-10s of retransmit),
1 or twice by hour :/
Hmm sounds a bit weird. Seemingly random?
so I'm really scared to increase the cluster size.
Note that I have around 1000vms, so I don't known impact of number of
messages/s.
Question : do you think streaming all vm statistics could impact number of
message/s ?
Do you use something which could trigger frequent writes/modifies on
/etc/pve ?
Just running VMs normally does not modifies anything, there are mostly just
reads which should not cause any problems as they won't go over the wire and
are also fast from the DB as its in RAM, only modifications have to be send
to other nodes.
You could look if
# inotifywait -e attrib,modify,create,delete,move -r -m /etc/pve/
generates a lot of output, this is just the info how the local node modifies
the pmxcfs, not all.
FYI, the HA manager uses it frequently but in a 5 seconds cycle, so not
really
heavy usage.
Can you also send me the output from
# corosync-cmapctl
This is quite some data and contains IP addresses so you maybe want to sent
it to me directly.
----- Mail original -----
De: "Thomas Lamprecht" <t.lampre...@proxmox.com>
À: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 21 Septembre 2016 09:40:01
Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes),
get rid of corosync ?
On 09/21/2016 08:50 AM, Alexandre DERUMIER wrote:
Forgot to mention that consul supports multiple clusters and/or multi
center clusters out of the box.
yes, I read the doc yesterday. seem very interesting.
The most work could be to replace pmxcs by consul kv store. I have seen some
consul fuse fs implementation,
but it don't have all pmxcs features (like symlinks for example).
Zookeeper seem to be lower level.
reading sheedog plugin:(1500loc)
https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/zookeeper.c
vs
https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/corosync.c
Discussion and evaluating options is good but throwing instantly all away,
and switching to another - not necessarily better - cluster stack is
maybe a bit overreacted. :) I also think that our current cluster stack,
with corosync + pve-cluser (pmxcfs) is quite stable and a lot of things
depend on it.
Also corosync is very well tested software and works really good, at least
with small to mid size clusters (< 60 nodes - which I find is quite an
achievement for a cluster!). You have also to consider
that quite some overhead, and thus node limitation, may come from the
database used by pmxcfs, the transaction needs to be synced with disk to
make everything reliable and while this is quite optimized it makes things
slower (placing the DB on really fast storage could help here).
I, personally, would prefer to keep corosync and introduce a protocol which
allows connecting multiple clusters (easier said, but still less change and
work then adapting to another cluster stack, which is most surely not
better, or has other drawbacks.)
Also taking a look at the corosync satellite approach sounds interesting.
Connecting multiple clusters is also another approach then a small cluster
with a lot of satellite nodes per cluster node, I see the former better as
its more decentralized and seems to fit netter in our current design. :)
Note that for scaling, zookeeper,consul,... have some kind of master nodes for
the quorum, and client nodes. (same than corosync satelitte).
I don't think it's technically possible to scale with full mesh masters nodes
with lot of nodes.
No, with full mesh you wont really overcome the limits and problems corosync
has here, corosync utilizes the possibilities quite well with multicast
here.
@Alexandre, you say that with 16 nodes the cluster is quite at is maximum,
can I get some more infos from you as I currently do not have the
hardware to
test this :)
Do you use IGMP snooping/queriers?
On which network communicates corosync, on an independent? And how fast
is it?
Redundant rings also?
----- Mail original -----
De: "datanom.net" <m...@datanom.net>
À: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 21 Septembre 2016 07:49:06
Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes),
get rid of corosync ?
On Wed, 21 Sep 2016 01:45:18 +0200
Michael Rasmussen <m...@datanom.net> wrote:
https://github.com/hashicorp/consul
Forgot to mention that consul supports multiple clusters and/or multi
center clusters out of the box.
_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel