Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Anant Saraswat Sun, 28 Jan 2024 05:58:44 -0800

Hi All,

I have now copied  /var/lib/glusterd/geo-replication/secret.pem.pub  (public 
key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh 
from master node3 to drtier1data using the georep key 
(/var/lib/glusterd/geo-replication/secret.pem).


But I am still getting the same error, and geo-replication is getting faulty 
again and again.

[2024-01-28 13:46:38.897683] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449598}]
[2024-01-28 13:46:38.922491] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:46:38.923127] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:46:38.923313] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449598}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:39.973584] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:46:40.98970] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:46:40.757691] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:46:40.766860] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:46:50.793311] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:46:50.793469] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:46:50.874474] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:46:52.659114] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7844}]
[2024-01-28 13:46:52.659461] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:46:53.698769] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0392}]
[2024-01-28 13:46:53.698984] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:46:55.831999] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:46:55.832354] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449615}]
[2024-01-28 13:46:55.854684] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:46:55.855251] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:46:55.855419] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449615}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:56.905496] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:46:57.38262] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:46:57.704128] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:46:57.706743] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:07.741438] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:07.741582] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:07.821284] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:47:09.573661] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7521}]
[2024-01-28 13:47:09.573955] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:47:10.612173] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0381}]
[2024-01-28 13:47:10.612359] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:47:12.751856] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:47:12.752237] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449632}]
[2024-01-28 13:47:12.759138] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:47:12.759690] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:47:12.759868] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449632}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:47:13.810321] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:47:13.924068] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:47:14.617663] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:47:14.620035] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:24.646013] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:24.646157] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:24.725510] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:47:26.491939] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7662}]
[2024-01-28 13:47:26.492235] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:47:27.530852] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0385}]
[2024-01-28 13:47:27.531036] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:47:29.670099] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:47:29.670640] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449649}]
[2024-01-28 13:47:29.696144] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:47:29.696709] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:47:29.696899] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449649}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:47:30.751127] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:47:30.885824] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:47:31.535252] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:47:31.538450] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:41.564276] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:41.564426] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:41.645110] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:47:43.435830] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7904}]
[2024-01-28 13:47:43.436285] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:47:44.475671] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0393}]
[2024-01-28 13:47:44.475865] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:47:46.630478] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:47:46.630924] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449666}]
[2024-01-28 13:47:46.655069] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:47:46.655752] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:47:46.655926] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449666}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:47:47.706875] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:47:47.834996] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:47:48.480822] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:47:48.491306] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:47:58.518263] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:47:58.518412] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:47:58.601096] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:00.355000] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7537}]
[2024-01-28 13:48:00.355345] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:48:01.395025] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0396}]
[2024-01-28 13:48:01.395212] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:48:03.541059] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:48:03.541481] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449683}]
[2024-01-28 13:48:03.567552] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:48:03.568172] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:48:03.568376] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449683}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:48:04.621488] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:48:04.742268] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:48:04.919335] I [master(worker 
/opt/tier1data2019/brick):2013:syncjob] Syncer: Sync Time Taken [{job=3}, 
{num_files=10}, {return_code=3}, {duration=0.0180}]
[2024-01-28 13:48:04.919919] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-zo_ev6yu/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/799165/cwd}, {error=3}]
[2024-01-28 13:48:05.399226] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:48:05.403931] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:48:15.430175] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:48:15.430308] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:48:15.510770] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:17.240311] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7294}]
[2024-01-28 13:48:17.240509] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:48:18.279007] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0384}]
[2024-01-28 13:48:18.279195] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:48:20.455937] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:48:20.456274] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449700}]
[2024-01-28 13:48:20.464288] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:48:20.464807] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:48:20.464970] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449700}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:48:21.514201] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:48:21.644609] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:48:22.284920] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:48:22.286189] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:48:32.312378] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:48:32.312526] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:48:32.393484] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:34.91825] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.6981}]
[2024-01-28 13:48:34.92130] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

Thanks,
Anant

________________________________
From: Anant Saraswat <[email protected]>
Sent: 28 January 2024 1:33 AM
To: Strahil Nikolov <[email protected]>; [email protected] 
<[email protected]>
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

Hi @Strahil Nikolov<mailto:[email protected]>,

I have checked the ssh connection from all the master servers and I can ssh 
drtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in 
<module>
    main()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in main
    if args.subcmd in ("worker"):
TypeError: 'in <string>' requires string as left operand, not NoneType
Connection to drtier1data closed.

But I am able to ssh  drtier1data from master3 without using the georep key.

[root@master3 ~]# ssh  root@drtier1data
Last login: Sun Jan 28 01:16:25 2024 from 87.246.74.32
[root@drtier1data ~]#

Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log

[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/553418/cwd}, {error=3}]

Many thanks,
Anant
________________________________
From: Strahil Nikolov <[email protected]>
Sent: 27 January 2024 5:25 AM
To: [email protected] <[email protected]>; Anant Saraswat 
<[email protected]>
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Don't forget to test with the georep key. I think it was 
/var/lib/glusterd/geo-replication/secret.pem

Best Regards,
Strahil Nikolov


? ??????, 27 ?????? 2024 ?. ? 07:24:07 ?. ???????+2, Strahil Nikolov 
<[email protected]> ??????:





Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






? ?????, 26 ?????? 2024 ?. ? 18:07:59 ?. ???????+2, Anant Saraswat 
<[email protected]> ??????:







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]

[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount process 
exited [{error=ENOTCONN}]

[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]

[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?



Thanks,

Anant





________________________________
From: Gluster-users <[email protected]> on behalf of Anant 
Saraswat <[email protected]>
Sent: 22 January 2024 9:00 PM
To: [email protected] <[email protected]>
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds



EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,




We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.




# gluster volume info

Volume Name: tier1data

Type: Replicate

Volume ID: 93c45c14-f700-4d50-962b-7653be471e27

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: master1:/opt/tier1data2019/brick

Brick2: master2:/opt/tier1data2019/brick

Brick3: master3:/opt/tier1data2019/brick





master1 |master2 |  
------------------------------geo-replication----------------------------- | 
drtier1datamaster3 |



We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).



Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.



Upon checking the gsyncd.log on the master1 node, we observed the following 
error (please refer to the attached logs for more details):

E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] <top>: 
Gluster Mount process exited [{error=ENOTCONN}]



# gluster volume geo-replication tier1data status

MASTER NODE            MASTER VOL    MASTER BRICK                SLAVE USER    
SLAVE                                            SLAVE NODE    STATUS           
  CRAWL STATUS    LAST_SYNCED

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

master1    tier1data     /opt/tier1data2019/brick    root          
ssh://drtier1data::drtier1data    N/A           Faulty           N/A            
 N/A

master2    tier1data     /opt/tier1data2019/brick    root          
ssh://drtier1data::drtier1data                  Passive            N/A          
   N/A



Suspecting an issue on the drtier1data(slave)?, I attempted to restart Gluster 
on the slave node, also tried to restart drtier1data server without any luck.



After that I tried the following command to get the Primary-log-file for 
geo-replication on master1, and got the following error.



# gluster volume geo-replication tier1data drtier1data::drtier1data config 
log-file

Staging failed on master3. Error: Geo-replication session between tier1data and 
drtier1data::drtier1data does not exist.

geo-replication command failed




Master3 was the new node added a few months back, but geo-replication was 
working until today, and we never added this node under geo-replication.

After that, I forcefully stopped the geo-replication, thinking that restarting 
geo-replication might fix the issue. However, now the geo-replication is not 
starting and is giving the same error.

# gluster volume geo-replication tier1data drtier1data::drtier1data start force

Staging failed on master3. Error: Geo-replication session between tier1data and 
drtier1data::drtier1data does not exist.

geo-replication command failed

Can anyone please suggest what I should do next to resolve this issue? As there 
is 5TB of data in this volume, I don't want to resync the entire data to 
drtier1data. Instead, I want to resume the sync from where it last stopped.



Thanks in advance for any guidance/help.



Kind regards,

Anant

?

DISCLAIMER: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the sender. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee, you should not 
disseminate, distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete this 
email from your system.

If you are not the intended recipient, you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited. Thanks for your cooperation.



DISCLAIMER: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the sender. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee, you should not 
disseminate, distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete this 
email from your system.

If you are not the intended recipient, you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited. Thanks for your cooperation.

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: 
https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!FIFMVBFvoomIXp1pMhjtLbD-1B_qztpAUPBHP5MST7a1hcf3FP8o6GkbQwzQUnS2nT_YIQ1MF7GV_PtM0CAQoOCo4VSgaw$
Gluster-users mailing list
[email protected]
https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/gluster-users__;!!I_DbfM1H!FIFMVBFvoomIXp1pMhjtLbD-1B_qztpAUPBHP5MST7a1hcf3FP8o6GkbQwzQUnS2nT_YIQ1MF7GV_PtM0CAQoOBrmvmlMg$

DISCLAIMER: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the sender. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee, you should not 
disseminate, distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete this 
email from your system.

If you are not the intended recipient, you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited. Thanks for your cooperation.

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Reply via email to