Re: small Struggle with 2 Node Cluster

Tom Meierjürgen Wed, 20 Oct 2021 09:44:32 -0700


Am 20.10.2021 um 12:24 schrieb Maxim Solodovnik:Multicast is required
for Hazelcast to be able to auto-discover nodes

    You can choose other type of network configuration
    More details at Hazelcast docs:
    https://docs.hazelcast.com/imdg/4.2/configuration/configuring-declaratively


    Found something that sound like being the right way inside the
    Hazelcast-Dcos:
    
https://docs.hazelcast.org/docs/4.0/manual/html-single/index.html#discovering-members-by-tcp
    ...

    is that the proper way to circumvent the multicast-requirement? Or
    is this not working for om?


You can choose whatever method you think is better fit your needs :)

So multicast is not really required when om is running on the host and
not inside docker and i can switch to pure tcp/udp in the
hazelcast-config to reduce network-traffic?

Documentation has multicast because:
- having too much options will complicate everything
- this was the way I have tested
- this was the way allowing to add/remove nodes dynamically (i.e.
cocker containers created on-demand with dynamic IPs)



    This "Fail to create Kurento client, will re-try in 3000 ms"
    Error states there is something wrong with connection to KMS
    Is it running? Is the URL correct?


    I tested KMS in different ways: Standalone Docker on each node
    including the latest kurento media server image connected via
    Docker Hostinterface docker0 or

    Docker Bridgeinterface docker_gwbridge or Localhostinterface lo,
    Docker swarm in replicated mode with a kms instance running on
    each node (plus a spare one to fire up

    if one kms stops due to anything unwanted) running via
    Localhostinterface lo or via Docker Bridgeinterface
    docker_gwbridge and finally in Docker global mode connected via

    Localhostinterface lo or Docker Bridgeinterface docker_gwbridge,
    same results : Node1 working as expected in the rooms located on
    Node1 (Users are redirected there no matter

    where their login tok place), Node2 without ability to see and
    hear the other Participants in the rooms located on Node2,
    independant where the login took place as the users are also
    redirected there.


It seems I wasn't clear enough (sorry I'm not native English speaker)

Me neither, native German Speaker :-) so don´t worry as long as we find
out whats the oponents intention to state ;-)


I'll try to be more clear: by default KMSes are not clusterized i.e.
every KMS works on it's own
So out-of-the-box OM is connected to one KMS
then at the time first user enters the room
internal load balancer checks which OM node+KMS server is less loaded
And created multi-media room at, let's say node1
When other user tries to enter same room at node2, he/she will be
redirected to node1
So the same KMS instance will be utilized

So i switched now back to standalone docker per node including one kms
instance for each docker on each node...


Sebastian was able to create commercial (not open-source) cluster of
KMS instances
So you can have more users at the same room (on one OM node)


    "every user can get his own video+audio stream but doesn´t get
    the strems of the other participants" this might be because users
    are "in fact" in different rooms
    OM out-of-the-box configuration provides "room based" clustering
    This basically means
    - there is 1 DB


    Given, as the Galera-Cluster replicates the same db to each node
    and is instantly syncing the dbs when a transaction takes place in
    the db, so its logically each node equipped with the same database

    via Localhostinterface which is kept in sync in the background via
    the tunnel.


The documentation (and my memories might be incorrect)
OM has some sort of automatic remote-commit-provider (which is
described here https://openmeetings.apache.org/Clustering.html#database1)
So I guess there might be different Databases and commit should be
synced at JPA level

Did i understand it right now that both om instances should NOT use the
SAME db but each om instance should use its own database as Hazelcast/OM
syncs their content (Rooms,Users/groups etc.) instead of sharing a
one-for-all database between the nodes with individual db-users and
passwords per node? And then Token number 26 in the configuration
(application.base.url) will point to the individual nodes url again
instead of the url of only one of the nodes? Might explain some of the
problems then :-) and lift the wooden board a bit away from my eyes :-D

    - there are multiple OM nodes (each has its own KMS)

    Each OM node has its KMS, either standalone in Docker or via the
    swarm as stated above..maybe standalone (and so explicitly
    fixating to the kms on that node) will work better due to less
    internal rerouting of the swarms loadbalancing overlaynetwork?

    - users will be redirected to the node the room is currently hosted

    Redirection works obviously as the url states that the users are
    on the nodes the room was created when they enter the room
    starting from the other node.


Since you are redirected
And you have (empty) video pod
the issue might be caused by adsense of TURN server
Might it be the case?


Coturn as Turnserver is not absent, but i tried now to fiddle out
settings for listening-ip, relay-ip (formerly was set to nothing as the
Turnserverconfig told me coturn would listen to all adresses and figure
out by itself which relay-ip to use) and allowed-peer.ip as well as
alternate-server ips...and do i need a turn user and individual
secret/key/password when initially configured with the
static-auth-secret (set by both om instances and turnservers to the same
value actually)? Seems my turnservers have some struggle which might be
based in clear identification...realm is set,servernames also but no
users up to now...

Logfilesnippet from syslog according to Turnserver:

Oct 20 18:32:05 cznode1 turnserver: 6389: session 001000000000000890:
usage: realm=<$RELAM_NAME_PLACEHOLDER$>, username=<>, rp=10, rb=224,
sp=10, sb=960
Oct 20 18:32:05 cznode1 turnserver: 6389: closing session
0x7f6b0c03d9e0, client socket 0x7f6b0c07ebb0 (socket session=0x7f6b0c03d9e0)
Oct 20 18:32:05 cznode1 turnserver: 6389: session 001000000000000890:
closed (2nd stage), user <> realm <$RELAM_NAME_PLACEHOLDER$> origin <>,
local $IP_PLACEHOLDER_NODE1$:3478, remote
$IP_PLACEHOLDER_CLIENT1$:50956, reason: allocation watchdog determined
stale session state

Looks for me like Session not working as it should due to missing
authenticationparts,i.e. user+key or user+password or at least a user?
setting in openmeetings.properties kurento.turn.user  and
turnserver.conf ? But where placing the password or just without using a
password and keeping the static secret or just replacing the static
secret in openmeetinghs-properties with a password? Inside
Turnserver:keeping use-auth-secret and just putting user=$USERNAME and
leave key or password? Or placing Key or Password and putting it
additional in openmeetings.properties with a kolon?

(cut off a bunch of TOFU from previous messages to reduce messagesize)

Best regards,
Maxim



Kind regards and a lot of thanks for the help,

 Tom

Re: small Struggle with 2 Node Cluster

Reply via email to