[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Khodayar Doustar
Does your ping work or not? On Sun, May 24, 2020 at 6:53 AM Amudhan P wrote: > Yes, I have set setting on the switch side also. > > On Sat 23 May, 2020, 6:47 PM Khodayar Doustar, > wrote: > >> Problem should be with network. When you change MTU it should be changed >> all over the network, any

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P
Yes, I have set setting on the switch side also. On Sat 23 May, 2020, 6:47 PM Khodayar Doustar, wrote: > Problem should be with network. When you change MTU it should be changed > all over the network, any single hup on your network should speak and > accept 9000 MTU packets. you can check it on

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Suresh Rama
Hi, It should be ping -M do -s 8972 IP ADDRESS. You can't ping with 9000 size. If you can't ping with 8972 size, then somewhere in the path MTU config is wrong. Regards, Suresh On Sat, May 23, 2020, 1:35 PM apely agamakou wrote: > Hi, > > Please check you MTU limit at the switch level, cand

[ceph-users] Re: PGS INCONSISTENT - read_error - replace disk or pg repair then replace disk

2020-05-23 Thread Anthony D'Atri
Historically I have often but not always found that removing / destroying the affected OSD would clear the inconsistent PG. At one point the logged message was clear about who reported and who was the perp, but then a later release broke that. Not sure what recent releases say, since with Lumi

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread apely agamakou
Hi, Please check you MTU limit at the switch level, cand check other ressources with icmp ping. Try to add 14Byte for ethernet header at your switch level mean an MTU of 9014 ? are you using juniper ??? Exemple : ping -D -s 9 other_ip Le sam. 23 mai 2020 à 15:18, Khodayar Doustar a écrit

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
Hi Ashley The command to reset the flag for ALL OSDs is   ceph config set osd bluefs_preextend_wal_files false And for just an individual OSD:   ceph config set osd.5 bluefs_preextend_wal_files false And to remove it from an individual one (so you just have the global one left):   

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Ashley Merrick
Hello,Great news can you confirm the exact command you used to inject the value so I can replicate you exact steps.I will do that and then leave it a good couple of days before trying a reboot to make sure the WAL is completely flushed Thanks Ashley On Sat, 23 May 2020 23:20:45 +0800 chri

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
Status date: We seem to have success. I followed the steps below. Only one more OSD (on node3) failed to restart, showing the same WAL corruption messages. After replacing that & backfilling I could then restart it. So we have a healthy cluster with restartable OSDs again, with bluefs_preexte

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Khodayar Doustar
Problem should be with network. When you change MTU it should be changed all over the network, any single hup on your network should speak and accept 9000 MTU packets. you can check it on your hosts with "ifconfig" command and there is also equivalent commands for other network/security devices. I

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread si...@turka.nl
Can the servers/nodes ping eachother using large packet sizes? I guess not. Sinan Polat > Op 23 mei 2020 om 14:21 heeft Amudhan P het volgende > geschreven: > > In OSD logs "heartbeat_check: no reply from OSD" > >> On Sat, May 23, 2020 at 5:44 PM Amudhan P wrote: >> >> Hi, >> >> I have set

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P
In OSD logs "heartbeat_check: no reply from OSD" On Sat, May 23, 2020 at 5:44 PM Amudhan P wrote: > Hi, > > I have set Network switch with MTU size 9000 and also in my netplan > configuration. > > What else needs to be checked? > > > On Sat, May 23, 2020 at 3:39 PM Wido den Hollander wrote: > >

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P
Hi, I have set Network switch with MTU size 9000 and also in my netplan configuration. What else needs to be checked? On Sat, May 23, 2020 at 3:39 PM Wido den Hollander wrote: > > > On 5/23/20 12:02 PM, Amudhan P wrote: > > Hi, > > > > I am using ceph Nautilus in Ubuntu 18.04 working fine wit

[ceph-users] Re: PGS INCONSISTENT - read_error - replace disk or pg repair then replace disk

2020-05-23 Thread Massimo Sgaravatto
When I see this problem usually: - I run pg repair - I remove the OSD from the cluster - I replace the disk - I recreate the OSD on the new disk Cheers, Massimo On Wed, May 20, 2020 at 9:41 PM Peter Lewis wrote: > Hello, > > I came across a section of the documentation that I don't quite > un

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Wido den Hollander
On 5/23/20 12:02 PM, Amudhan P wrote: > Hi, > > I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size 1500 > (default) recently i tried to update MTU size to 9000. > After setting Jumbo frame running ceph -s is timing out. Ceph can run just fine with an MTU of 9000. But there is p

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
Hi Ashley Setting bluefs_preextend_wal_files to false should stop any further corruption of the WAL (subject to the small risk of doing this while the OSD is active). Over time WAL blocks will be recycled and overwritten with new good blocks, so the extent of the corruption may decrease or ev

[ceph-users] Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P
Hi, I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size 1500 (default) recently i tried to update MTU size to 9000. After setting Jumbo frame running ceph -s is timing out. regards Amudhan P ___ ceph-users mailing list -- ceph-users@ceph.

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Ashley Merrick
Hello Chris, Great to hear, few questions. Once you have injected the bluefs_preextend_wal_files to false, are you just rebuilding the OSD's that failed? Or are you going through and rebuilding every OSD even the working one's? Or does setting the bluefs_preextend_wal_files value to fals

[ceph-users] Re: question on ceph node count

2020-05-23 Thread tim taler
Thnx Anthony on a 5 node cluster, replica 5, 5 MON, failure domain is host: > > any two nodes might fail without leaving the cluster > > unresponsive? > > No. Assuming again that your failure domain is “host”, some PGs will have > two copies on these two nodes, so they will be undersized until

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
Hi Ashley Igor has done a great job of tracking down the problem, and we have finally shown evidence of the type of corruption it would produce in one of my WALs. Our feeling at the moment is that the problem can be detoured by setting bluefs_preextend_wal_files to false on affected OSDs whil