Hey Rob,
same issue for our third volume. Have a look at the logs just from right
now (below).
Question: You removed the htime files and the old changelogs. Just rm
the files or is there something to pay more attention
before removing the changelog files and the htime file.
Regards,
Felix
[2020-06-25 07:51:53.795430] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH: SSH
connection between master and slave established. duration=1.2341
[2020-06-25 07:51:53.795639] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER: Mounting
gluster volume locally...
[2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor] Monitor:
worker died in startup phase brick=/gluster/vg01/dispersed_fuse1024/brick
[2020-06-25 07:51:54.535809] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
[2020-06-25 07:51:54.882143] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER: Mounted
gluster volume duration=1.0864
[2020-06-25 07:51:54.882388] I [subcmds(worker
/gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] <top>: Worker
spawn successful. Acknowledging back to monitor
[2020-06-25 07:51:56.911412] E [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117,
in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 40, in register
return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 46, in cl_register
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 30, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2020-06-25 07:51:56.912056] E [repce(worker
/gluster/vg00/dispersed_fuse1024/brick):213:__call__] RepceClient: call
failed call=75086:140098349655872:1593071514.91 method=register
error=ChangelogException
[2020-06-25 07:51:56.912396] E [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1286:service_loop] GLUSTER:
Changelog register failed error=[Errno 2] No such file or directory
[2020-06-25 07:51:56.928031] I [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):96:service_loop] RepceServer:
terminating on reaching EOF.
[2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor] Monitor:
worker died in startup phase brick=/gluster/vg00/dispersed_fuse1024/brick
[2020-06-25 07:51:57.895920] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
[2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):287:set_passive] GeorepStatus:
Worker Status Change status=Passive
[2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
/gluster/vg01/dispersed_fuse1024/brick):287:set_passive] GeorepStatus:
Worker Status Change status=Passive
[2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):281:set_active] GeorepStatus:
Worker Status Change status=Active
On 25/06/2020 09:15, [email protected] wrote:
Hi All,
We’ve got two six node RHEL 7.8 clusters and geo-replication would
appear to be completely broken between them. I’ve deleted the session,
removed & recreated pem files, old changlogs/htime (after removing
relevant options from volume) and completely set up geo-rep from
scratch, but the new session comes up as Initializing, then goes
faulty, and starts looping. Volume (on both sides) is a 4 x 2
disperse, running Gluster v6 (RH latest). Gsyncd reports:
[2020-06-25 07:07:14.701423] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Initializing...
[2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor] Monitor:
starting gsyncd worker brick=/rhgs/brick20/brick
slave_node=bxts470194.eu.rabonet.com
[2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor] Monitor:
Worker would mount volume privately
[2020-06-25 07:07:14.757181] I [gsyncd(agent
/rhgs/brick20/brick):318:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
[2020-06-25 07:07:14.758126] D [subcmds(agent
/rhgs/brick20/brick):107:subcmd_agent] <top>: RPC FD
rpc_fd='5,12,11,10'
[2020-06-25 07:07:14.758627] I [changelogagent(agent
/rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent listining...
[2020-06-25 07:07:14.764234] I [gsyncd(worker
/rhgs/brick20/brick):318:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
[2020-06-25 07:07:14.779409] I [resource(worker
/rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH
connection between master and slave...
[2020-06-25 07:07:14.841793] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068834.84 __repce_version__() ...
[2020-06-25 07:07:16.148725] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068834.84 __repce_version__ -> 1.0
[2020-06-25 07:07:16.148911] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068836.15 version() ...
[2020-06-25 07:07:16.149574] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068836.15 version -> 1.0
[2020-06-25 07:07:16.149735] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068836.15 pid() ...
[2020-06-25 07:07:16.150588] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068836.15 pid -> 30703
[2020-06-25 07:07:16.150747] I [resource(worker
/rhgs/brick20/brick):1435:connect_remote] SSH: SSH connection between
master and slave established. duration=1.3712
[2020-06-25 07:07:16.150819] I [resource(worker
/rhgs/brick20/brick):1105:connect] GLUSTER: Mounting gluster volume
locally...
[2020-06-25 07:07:16.265860] D [resource(worker
/rhgs/brick20/brick):879:inhibit] DirectMounter: auxiliary glusterfs
mount in place
[2020-06-25 07:07:17.272511] D [resource(worker
/rhgs/brick20/brick):953:inhibit] DirectMounter: auxiliary glusterfs
mount prepared
[2020-06-25 07:07:17.272708] I [resource(worker
/rhgs/brick20/brick):1128:connect] GLUSTER: Mounted gluster
volume duration=1.1218
[2020-06-25 07:07:17.272794] I [subcmds(worker
/rhgs/brick20/brick):84:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2020-06-25 07:07:17.272973] D [master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>: setting up change
detection mode mode=xsync
[2020-06-25 07:07:17.273063] D [monitor(monitor):273:monitor] Monitor:
worker(/rhgs/brick20/brick) connected
[2020-06-25 07:07:17.273678] D [master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>: setting up change
detection mode mode=changelog
[2020-06-25 07:07:17.274224] D [master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>: setting up change
detection mode mode=changeloghistory
[2020-06-25 07:07:17.276484] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068837.28 version() ...
[2020-06-25 07:07:17.276916] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068837.28 version -> 1.0
[2020-06-25 07:07:17.277009] D [master(worker
/rhgs/brick20/brick):777:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick
[2020-06-25 07:07:17.277098] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068837.28 init() ...
[2020-06-25 07:07:17.292944] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068837.28 init -> None
[2020-06-25 07:07:17.293097] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068837.29 register('/rhgs/brick20/brick',
'/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick',
'/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log',
8, 5) ...
[2020-06-25 07:07:19.296294] E [repce(agent
/rhgs/brick20/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117,
in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 40, in register
return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level,
retries)
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 46, in cl_register
cls.raise_changelog_err()
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 30, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2020-06-25 07:07:19.297161] E [repce(worker
/rhgs/brick20/brick):213:__call__] RepceClient: call failed
call=6799:140380783982400:1593068837.29 method=register
error=ChangelogException
[2020-06-25 07:07:19.297338] E [resource(worker
/rhgs/brick20/brick):1286:service_loop] GLUSTER: Changelog register
failed error=[Errno 2] No such file or directory
[2020-06-25 07:07:19.315074] I [repce(agent
/rhgs/brick20/brick):96:service_loop] RepceServer: terminating on
reaching EOF.
[2020-06-25 07:07:20.275701] I [monitor(monitor):280:monitor] Monitor:
worker died in startup phase brick=/rhgs/brick20/brick
[2020-06-25 07:07:20.277383] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
We’ve done everything we can think of, including an “strace –f” on the
pid, and we can’t really find anything. I’m about to lose the last of
my hair over this, so does anyone have any ideas at all? We’ve even
removed the entire slave vol and rebuilt it.
Thanks
Rob
*Rob Quagliozzi*
*Specialised Application Support*
------------------------------------------------------------------------
This email (including any attachments to it) is confidential, legally
privileged, subject to copyright and is sent for the personal
attention of the intended recipient only. If you have received this
email in error, please advise us immediately and delete it. You are
notified that disclosing, copying, distributing or taking any action
in reliance on the contents of this information is strictly
prohibited. Although we have taken reasonable precautions to ensure no
viruses are present in this email, we cannot accept responsibility for
any loss or damage arising from the viruses in this email or
attachments. We exclude any liability for the content of this email,
or for the consequences of any actions taken on the basis of the
information provided in this email or its attachments, unless that
information is subsequently confirmed in writing. <#rbnl#1898i>
------------------------------------------------------------------------
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users