Thanks Deepu. I will investigate this can you just summarize the steps which can be helpful in reproducing this issue.
/sunny On Fri, Nov 29, 2019 at 7:29 AM deepu srinivasan <[email protected]> wrote: > > Hi Sunny > The issue seems to be a bug. > The issue got fixed when I restarted the glusterd daemon in the slave > machines. The logs in the slave end reported that the mount-broker folder was > not in the vol file. So when I restarted the machine it got fixed. > This might be some race condition. > > On Thu, Nov 28, 2019 at 9:00 PM deepu srinivasan <[email protected]> wrote: >> >> Hi Sunny >> I Also got this error in slave end >>> >>> [2019-11-28 15:30:12.520461] I [resource(slave >>> 192.168.185.89/home/sas/gluster/data/code-misc):1105:connect] GLUSTER: >>> Mounting gluster volume locally... >>> >>> [2019-11-28 15:30:12.649425] E [resource(slave >>> 192.168.185.89/home/sas/gluster/data/code-misc):1013:handle_mounter] >>> MountbrokerMounter: glusterd answered mnt= >>> >>> [2019-11-28 15:30:12.650573] E [syncdutils(slave >>> 192.168.185.89/home/sas/gluster/data/code-misc):805:errlog] Popen: command >>> returned error cmd=/usr/sbin/gluster --remote-host=localhost system:: >>> mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.118_code-misc/mnt-192.168.185.89-home-sas-gluster-data-code-misc.log >>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>> >>> [2019-11-28 15:30:12.650742] E [syncdutils(slave >>> 192.168.185.89/home/sas/gluster/data/code-misc):809:logerr] Popen: >>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >> On Thu, Nov 28, 2019 at 6:45 PM deepu srinivasan <[email protected]> wrote: >>> >>> [email protected]/var/log/glusterfs#ssh -oPasswordAuthentication=no >>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem >>> -p 22 [email protected] "sudo gluster volume status" >>> >>> ************************************************************************************************************************** >>> >>> WARNING: This system is a restricted access system. All activity on this >>> system is subject to monitoring. If information collected reveals possible >>> criminal activity or activity that exceeds privileges, evidence of such >>> activity may be providedto the relevant authorities for further action. >>> >>> By continuing past this point, you expressly consent to this monitoring >>> >>> ************************************************************************************************************************** >>> >>> invoking sudo in restricted SSH session is not allowed >>> >>> >>> On Thu, Nov 28, 2019 at 6:04 PM Sunny Kumar <[email protected]> wrote: >>>> >>>> Hi Deepu, >>>> >>>> Can you try this: >>>> >>>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>> /var/lib/glusterd/geo-replication/secret.pem -p 22 >>>> [email protected] "sudo gluster volume status" >>>> >>>> /sunny >>>> >>>> >>>> On Thu, Nov 28, 2019 at 12:14 PM deepu srinivasan <[email protected]> >>>> wrote: >>>> >> >>>> >> MASTER NODE MASTER VOL MASTER BRICK >>>> >> SLAVE USER SLAVE SLAVE NODE >>>> >> STATUS CRAWL STATUS LAST_SYNCED >>>> >> >>>> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >> >>>> >> 192.168.185.89 code-misc /home/sas/gluster/data/code-misc >>>> >> sas [email protected]::code-misc N/A >>>> >> Faulty N/A N/A >>>> >> >>>> >> 192.168.185.101 code-misc /home/sas/gluster/data/code-misc >>>> >> sas [email protected]::code-misc 192.168.185.118 >>>> >> Passive N/A N/A >>>> >> >>>> >> 192.168.185.93 code-misc /home/sas/gluster/data/code-misc >>>> >> sas [email protected]::code-misc N/A >>>> >> Faulty N/A N/A >>>> > >>>> > >>>> > On Thu, Nov 28, 2019 at 5:43 PM deepu srinivasan <[email protected]> >>>> > wrote: >>>> >> >>>> >> I Think its configured properly. Should i check something else.. >>>> >> >>>> >> [email protected]/var/log/glusterfs#ssh [email protected] "sudo >>>> >> gluster volume info" >>>> >> >>>> >> ************************************************************************************************************************** >>>> >> >>>> >> WARNING: This system is a restricted access system. All activity on >>>> >> this system is subject to monitoring. If information collected reveals >>>> >> possible criminal activity or activity that exceeds privileges, >>>> >> evidence of such activity may be providedto the relevant authorities >>>> >> for further action. >>>> >> >>>> >> By continuing past this point, you expressly consent to this >>>> >> monitoring.- >>>> >> >>>> >> ************************************************************************************************************************** >>>> >> >>>> >> >>>> >> >>>> >> Volume Name: code-misc >>>> >> >>>> >> Type: Replicate >>>> >> >>>> >> Volume ID: e9b6fbed-fcd0-42a9-ab11-02ec39c2ee07 >>>> >> >>>> >> Status: Started >>>> >> >>>> >> Snapshot Count: 0 >>>> >> >>>> >> Number of Bricks: 1 x 3 = 3 >>>> >> >>>> >> Transport-type: tcp >>>> >> >>>> >> Bricks: >>>> >> >>>> >> Brick1: 192.168.185.118:/home/sas/gluster/data/code-misc >>>> >> >>>> >> Brick2: 192.168.185.45:/home/sas/gluster/data/code-misc >>>> >> >>>> >> Brick3: 192.168.185.84:/home/sas/gluster/data/code-misc >>>> >> >>>> >> Options Reconfigured: >>>> >> >>>> >> features.read-only: enable >>>> >> >>>> >> transport.address-family: inet >>>> >> >>>> >> nfs.disable: on >>>> >> >>>> >> performance.client-io-threads: off >>>> >> >>>> >> >>>> >> On Thu, Nov 28, 2019 at 5:40 PM Sunny Kumar <[email protected]> wrote: >>>> >>> >>>> >>> Hi Deepu, >>>> >>> >>>> >>> Looks like this is error generated due to ssh restrictions: >>>> >>> Can you please check and confirm ssh is properly configured? >>>> >>> >>>> >>> >>>> >>> 2019-11-28 11:59:12.934436] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> >>>> >>> ************************************************************************************************************************** >>>> >>> >>>> >>> [2019-11-28 11:59:12.934703] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: >>>> >>> This system is a restricted access system. All activity on this >>>> >>> system is subject to monitoring. If information collected reveals >>>> >>> possible criminal activity or activity that exceeds privileges, >>>> >>> evidence of such activity may be providedto the relevant authorities >>>> >>> for further action. >>>> >>> >>>> >>> [2019-11-28 11:59:12.934967] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By >>>> >>> continuing past this point, you expressly consent to this >>>> >>> monitoring.- ZOHO Corporation >>>> >>> >>>> >>> [2019-11-28 11:59:12.935194] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> >>>> >>> ************************************************************************************************************************** >>>> >>> >>>> >>> 2019-11-28 11:59:12.944369] I [repce(agent >>>> >>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: >>>> >>> terminating on reaching EOF. >>>> >>> >>>> >>> /sunny >>>> >>> >>>> >>> On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan <[email protected]> >>>> >>> wrote: >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > ---------- Forwarded message --------- >>>> >>> > From: deepu srinivasan <[email protected]> >>>> >>> > Date: Thu, Nov 28, 2019 at 5:32 PM >>>> >>> > Subject: Geo-Replication Issue while upgrading >>>> >>> > To: gluster-users <[email protected]> >>>> >>> > >>>> >>> > >>>> >>> > Hi Users/Developers >>>> >>> > I hope you remember the last issue we faced regarding the >>>> >>> > geo-replication goes to the faulty state while stopping and starting >>>> >>> > the geo-replication. >>>> >>> >> >>>> >>> >> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: >>>> >>> >> Worker Status Change status=Active >>>> >>> >> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] >>>> >>> >> GeorepStatus: Crawl Status Change status=History Crawl >>>> >>> >> [2019-11-16 17:29:43.630328] I [master(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting >>>> >>> >> history crawl turns=1 stime=(1573924576, 0) >>>> >>> >> entry_stime=(1573924576, 0) etime=1573925383 >>>> >>> >> [2019-11-16 17:29:44.636725] I [master(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's >>>> >>> >> time stime=(1573924576, 0) >>>> >>> >> [2019-11-16 17:29:44.778966] I [master(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] >>>> >>> >> _GMaster: Fixing ENOENT error in slave. Parent does not exist on >>>> >>> >> master. Safe to ignore, take out entry retry_count=1 >>>> >>> >> entry=({'uid': 0, 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', >>>> >>> >> 'gid': 0, 'mode': 33188, 'entry': >>>> >>> >> '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': >>>> >>> >> 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, >>>> >>> >> 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, >>>> >>> >> 'dst': False}) >>>> >>> >> [2019-11-16 17:29:44.779306] I [master(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):942:handle_entry_failures] >>>> >>> >> _GMaster: Sucessfully fixed entry ops with gfid mismatch >>>> >>> >> retry_count=1 >>>> >>> >> [2019-11-16 17:29:44.779516] I [master(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: >>>> >>> >> Retry original entries. count = 1 >>>> >>> >> [2019-11-16 17:29:44.879321] E [repce(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call >>>> >>> >> failed call=151945:140353273153344:1573925384.78 >>>> >>> >> method=entry_ops error=OSError >>>> >>> >> [2019-11-16 17:29:44.879750] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: >>>> >>> >> FAIL: >>>> >>> >> Traceback (most recent call last): >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line >>>> >>> >> 322, in main >>>> >>> >> func(args) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line >>>> >>> >> 82, in subcmd_worker >>>> >>> >> local.service_loop(remote) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line >>>> >>> >> 1277, in service_loop >>>> >>> >> g3.crawlwrap(oneshot=True) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>>> >>> >> 599, in crawlwrap >>>> >>> >> self.crawl() >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>>> >>> >> 1555, in crawl >>>> >>> >> self.changelogs_batch_process(changes) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>>> >>> >> 1455, in changelogs_batch_process >>>> >>> >> self.process(batch) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>>> >>> >> 1290, in process >>>> >>> >> self.process_change(change, done, retry) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line >>>> >>> >> 1195, in process_change >>>> >>> >> failures = self.slave.server.entry_ops(entries) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>>> >>> >> 233, in __call__ >>>> >>> >> return self.ins(self.meth, *a) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>>> >>> >> 215, in __call__ >>>> >>> >> raise res >>>> >>> >> OSError: [Errno 13] Permission denied: >>>> >>> >> '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb' >>>> >>> >> [2019-11-16 17:29:44.911767] I [repce(agent >>>> >>> >> /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: >>>> >>> >> terminating on reaching EOF. >>>> >>> >> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] >>>> >>> >> Monitor: worker died in startup phase >>>> >>> >> brick=/home/sas/gluster/data/code-misc6 >>>> >>> >> [2019-11-16 17:29:45.511806] I >>>> >>> >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >>>> >>> >> Status Change status=Faulty >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > Now after upgrading to 7.0 version from 5.6 we got an error in >>>> >>> > geo-replication. >>>> >>> > Scenario: >>>> >>> > >>>> >>> > We had a 1x3 replication and distributed volume in each DC. >>>> >>> > Both volumes are started and the geo-replication session is set up >>>> >>> > between them and the files are synched. Now the geo-replication >>>> >>> > session is deleted. >>>> >>> > Started to upgrade to 7.0 for each server starting from the slave >>>> >>> > end. I followed this link --> >>>> >>> > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ >>>> >>> > After starting the glusterd process created a geo-replication again >>>> >>> > but ends up in a faulty state. Please find the logs >>>> >>> > >>>> >>> >> [2019-11-28 11:59:12.370255] I >>>> >>> >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >>>> >>> >> Status Change status=Initializing... >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.370615] I [monitor(monitor):159:monitor] >>>> >>> >> Monitor: starting gsyncd worker >>>> >>> >> brick=/home/sas/gluster/data/code-misc slave_node=192.168.185.84 >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.445581] I [gsyncd(agent >>>> >>> >> /home/sas/gluster/data/code-misc):311:main] <top>: Using session >>>> >>> >> config file >>>> >>> >> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.448383] I [changelogagent(agent >>>> >>> >> /home/sas/gluster/data/code-misc):72:__init__] ChangelogAgent: >>>> >>> >> Agent listining... >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.453881] I [gsyncd(worker >>>> >>> >> /home/sas/gluster/data/code-misc):311:main] <top>: Using session >>>> >>> >> config file >>>> >>> >> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.472862] I [resource(worker >>>> >>> >> /home/sas/gluster/data/code-misc):1386:connect_remote] SSH: >>>> >>> >> Initializing SSH connection between master and slave... >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.933346] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: >>>> >>> >> connection to peer is broken >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934117] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc):805:errlog] Popen: command >>>> >>> >> returned error cmd=ssh -oPasswordAuthentication=no >>>> >>> >> -oStrictHostKeyChecking=no -i >>>> >>> >> /var/lib/glusterd/geo-replication/secret.pem -p 22 >>>> >>> >> -oControlMaster=auto -S >>>> >>> >> /tmp/gsyncd-aux-ssh-tKcFQe/5697733f424862ab9d57e019de78aca6.sock >>>> >>> >> [email protected] /usr/libexec/glusterfs/gsyncd slave code-misc >>>> >>> >> [email protected]::code-misc --master-node 192.168.185.89 >>>> >>> >> --master-node-id a7a9688e-700c-4452-9cd6-e10d6eed5335 >>>> >>> >> --master-brick /home/sas/gluster/data/code-misc --local-node >>>> >>> >> 192.168.185.84 --local-node-id cbafeca3-650b-4c9e-8ea6-2451ea9265dd >>>> >>> >> --slave-timeout 120 --slave-log-level INFO >>>> >>> >> --slave-gluster-log-level INFO --slave-gluster-command-dir >>>> >>> >> /usr/sbin --master-dist-count 3 error=1 >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934436] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> >>>> >>> >> ************************************************************************************************************************** >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934703] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: >>>> >>> >> This system is a restricted access system. All activity on this >>>> >>> >> system is subject to monitoring. If information collected reveals >>>> >>> >> possible criminal activity or activity that exceeds privileges, >>>> >>> >> evidence of such activity may be providedto the relevant >>>> >>> >> authorities for further action. >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934967] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By >>>> >>> >> continuing past this point, you expressly consent to this >>>> >>> >> monitoring.- ZOHO Corporation >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.935194] E [syncdutils(worker >>>> >>> >> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> >>>> >>> >> ************************************************************************************************************************** >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.944369] I [repce(agent >>>> >>> >> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: >>>> >>> >> terminating on reaching EOF. >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.944722] I [monitor(monitor):280:monitor] >>>> >>> >> Monitor: worker died in startup phase >>>> >>> >> brick=/home/sas/gluster/data/code-misc >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.947575] I >>>> >>> >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker >>>> >>> >> Status Change status=Faulty >>>> >>> > >>>> >>> > >>>> >>> >>>> ________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
