[ceph-users] Bluestore wal / block db size

2017-07-28 Thread Tobias Rehn
Hey, I am just playing around with luminous RC. As far as I can see it works nice. Studying around I found the following discussion about wal and block db size:  http://marc.info/?l=ceph-devel&m=149978799900866&w=2 Creating an osd with the following command: ceph-deploy osd create --bluestore -

[ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
Hi, I'm trying to find reason for strange recovery issues I'm seeing on our cluster.. it's mostly idle, 4 node cluster with 26 OSDs evenly distributed across nodes. jewel 10.2.9 the problem is that after some disk replaces and data moves, recovery is progressing extremely slowly.. pgs seem to be

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
I forgot to add that OSD daemons really seem to be idle, no disk activity, no CPU usage.. it just looks to me like some kind of deadlock, as they were waiting for each other.. and so I'm trying to get last 1.5% of misplaced / degraded PGs for almost a week.. On Fri, Jul 28, 2017 at 10:56:02AM +

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread linghucongsong
It look like the osd in your cluster is not all the same size. can you show ceph osd df output? At 2017-07-28 17:24:29, "Nikola Ciprich" wrote: >I forgot to add that OSD daemons really seem to be idle, no disk >activity, no CPU usage.. it just looks to me like some kind of >deadlock, as they

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote: > > > It look like the osd in your cluster is not all the same size. > > can you show ceph osd df output? you're right, they're not.. here's the output: [root@v1b ~]# ceph osd df tree ID WEIGHT REWEIGHT SIZE USE AVAIL %U

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread linghucongsong
You have two crush rule? One is ssd the other is hdd? Can you show ceph osd dump|grep pool ceph osd crush dump At 2017-07-28 17:47:48, "Nikola Ciprich" wrote: > >On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote: >> >> >> It look like the osd in your cluster is not all th

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
On Fri, Jul 28, 2017 at 05:52:29PM +0800, linghucongsong wrote: > > > > You have two crush rule? One is ssd the other is hdd? yes, exactly.. > > Can you show ceph osd dump|grep pool > pool 3 'vm' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread linghucongsong
1 You have 3 size pool I do not know why you set min_size 1. It is too dangous. 2 You had better use the same size and same num osds each host for crush. now you can try ceph osd reweight-by-utilization. command. When there is no user in you cluster. and I will go home. At 2017-07-28 17

[ceph-users] Unable to remove osd from crush map. - leads remapped pg's v11.2.0

2017-07-28 Thread nokia ceph
Hello, Recently we got an underlying issue with osd.10 which mapped to /dev/sde . So we tried to removed it from the crush === #systemctl stop ceph-osd@10.service #for x in {10..10}; do ceph osd out $x;ceph osd crush remove osd.$x;ceph auth del osd.$x;ceph osd rm osd.$x ;done marked out osd.10.

[ceph-users] ceph osd safe to remove

2017-07-28 Thread Dan van der Ster
Hi all, We are trying to outsource the disk replacement process for our ceph clusters to some non-expert sysadmins. We could really use a tool that reports if a Ceph OSD *would* or *would not* be safe to stop, e.g. # ceph-osd-safe-to-stop osd.X Yes it would be OK to stop osd.X (which of course m

Re: [ceph-users] Networking/naming doubt

2017-07-28 Thread Oscar Segarra
Hi David, Thanks a lot for your comments! I just want to utilize a different network than the public one (where dns resolves the name) for ceph-deploy and client connections. For example with 3 nics: Nic1: Public (internet acces) Nic2: Ceph-mon (clients and ceph-deploy) Nic3: Ceph-osd Thanks a

[ceph-users] CRC mismatch detection on read (XFS OSD)

2017-07-28 Thread Дмитрий Глушенок
Hi! Just found strange thing while testing deep-scrub on 10.2.7. 1. Stop OSD 2. Change primary copy's contents (using vi) 3. Start OSD Then 'rados get' returns "No such file or directory". No error messages seen in OSD log, cluster status "HEALTH_OK". 4. ceph pg repair Then 'rados get' works

Re: [ceph-users] ceph osd safe to remove

2017-07-28 Thread Alexandre Germain
Hello Dan, Something like this maybe? https://github.com/CanonicalLtd/ceph_safe_disk Cheers, Alex 2017-07-28 9:36 GMT-04:00 Dan van der Ster : > Hi all, > > We are trying to outsource the disk replacement process for our ceph > clusters to some non-expert sysadmins. > We could really use a to

Re: [ceph-users] ceph osd safe to remove

2017-07-28 Thread Peter Maloney
Hello Dan, Based on what I know and what people told me on IRC, this means basicaly the condition that the osd is not acting nor up for any pg. And for one person (fusl on irc) that said there was a unfound objects bug when he had size = 1, also he said if reweight (and I assume crush weight) is 0

Re: [ceph-users] CRC mismatch detection on read (XFS OSD)

2017-07-28 Thread Gregory Farnum
On Fri, Jul 28, 2017 at 8:16 AM Дмитрий Глушенок wrote: > Hi! > > Just found strange thing while testing deep-scrub on 10.2.7. > 1. Stop OSD > 2. Change primary copy's contents (using vi) > 3. Start OSD > > Then 'rados get' returns "No such file or directory". No error messages > seen in OSD log,

[ceph-users] ask about "recovery optimazation:recovery what isreally modified"

2017-07-28 Thread donglifec...@gmail.com
yaoning, haomai, Json what about the "recovery what is really modified" feature? I didn't see any update on github recently, will it be further developed? https://github.com/ceph/ceph/pull/3837 (PG:: recovery optimazation: recovery what is really modified) Thanks a lot. donglifec...@gmail.co