[ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-03 Thread shadow_lin
Hi list, During my test of ceph,I find sometime the whole ceph cluster are blocked and the reason was one unfunctional osd.Ceph can heal itself if some osd is down, but it seems if some osd is half dead (have heart beat but can't handle request) then all the request which are directed to that os

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-03 Thread Alex Gorbachev
On Fri, Mar 2, 2018 at 9:56 AM, Alex Gorbachev wrote: > > On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote: >> >> On 2018-03-02 07:54, Alex Gorbachev wrote: >> >> On Thu, Mar 1, 2018 at 10:57 PM, David Turner >> wrote: >> >> Blocked requests and slow requests are synonyms in ceph. They are 2 n

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Yan, Zheng
On Sat, Mar 3, 2018 at 6:17 PM, Jan Pekař - Imatic wrote: > On 3.3.2018 11:12, Yan, Zheng wrote: >> >> On Tue, Feb 27, 2018 at 2:29 PM, Jan Pekař - Imatic >> wrote: >>> >>> I think I hit the same issue. >>> I have corrupted data on cephfs and I don't remember the same issue >>> before >>> Luminou

[ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-03 Thread Jakub Jaszewski
Hi Ceph Admins, This night our ceph cluster got all pools 100% full. This happend after osd.56 (95% used) reached OSD_FULL state. ceph versions 12.2.2 Logs 2018-03-03 17:15:22.560710 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224452 : cluster [ERR] overall HEALTH_ERR noscrub,nodeep-scrub flag(s)

[ceph-users] BlueStore questions

2018-03-03 Thread Frank Ritchie
Hi all, I have a few questions on using BlueStore. With FileStore it is not uncommon to see 1 nvme device being used as the journal device for up to 12 OSDs. Can an adequately sized nvme device also be used as the wal/db device for up to 12 OSDs? Are there any rules of thumb for sizing wal/db?

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-03 Thread Stefan Priebe - Profihost AG
Hi, Am 02.03.2018 um 21:21 schrieb Stefan Priebe - Profihost AG: > Hi, > > Am 02.03.2018 um 14:29 schrieb Dan van der Ster: >> On Fri, Mar 2, 2018 at 10:12 AM, Stefan Priebe - Profihost AG >> wrote: >>> Thanks! Your patch works great! >> >> Cool! I plan to add one more feature to allow operators

Re: [ceph-users] how is iops from ceph -s client io section caculated?

2018-03-03 Thread shadow_lin
If it is because of replication then the iops in ceph status should be always relatively stable and be the times of the replication size of the fio's iops. From what I have saw the iops in ceph status keeps increasing overtime until it is relatively stable. 2018-03-04 lin.yunfan 发件人:David

Re: [ceph-users] how is iops from ceph -s client io section caculated?

2018-03-03 Thread David Turner
I would guess that the higher iops in ceph status are from iops calculated from replication. fio isn't aware of the backend replication iops, only what it's doing to the rbd On Fri, Mar 2, 2018, 11:53 PM shadow_lin wrote: > Hi list, > There is a client io section from the result of ceph -s. I fo

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-03-03 Thread Willem Jan Withagen
On 23/02/2018 14:27, Caspar Smit wrote: Hi All, What would be the proper way to preventively replace a DB/WAL SSD (when it is nearing it's DWPD/TBW limit and not failed yet). It hosts DB partitions for 5 OSD's Maybe something like: 1) ceph osd reweight 0 the 5 OSD's 2) let backfilling compl

[ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-03 Thread Jan Pekař - Imatic
Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware fai

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Jan Pekař - Imatic
On 3.3.2018 11:12, Yan, Zheng wrote: On Tue, Feb 27, 2018 at 2:29 PM, Jan Pekař - Imatic wrote: I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue before Luminous (i did the same tests before). It is on my test 1 node cluster with lower memory the

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Yan, Zheng
On Tue, Feb 27, 2018 at 2:29 PM, Jan Pekař - Imatic wrote: > I think I hit the same issue. > I have corrupted data on cephfs and I don't remember the same issue before > Luminous (i did the same tests before). > > It is on my test 1 node cluster with lower memory then recommended (so > server is s

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Jan Pekař - Imatic
Hi all, thank you for reply. I will answer your questions, try to reproduce it and if I succeed, start new thread. It can take a while, I'm quiet busy. My cluster was upgraded from Hammer or Jewel. Luminous cluster was healthy wen I started my test. It could happen, that load temporarily cau