Re: [ceph-users] Random checksum errors (bluestore on Luminous)

2017-12-17 Thread Martin Preuss
Hi, is there a way to find out which files on CephFS is are using a given pg? I'd like to check whether those files are corrupted... Also, how do I translate a bluestore like this X8 2017-12-17 03:04:29.512839 7f86c6347700 -1 bl

Re: [ceph-users] Random checksum errors (bluestore on Luminous)

2017-12-17 Thread Martin Preuss
BTW: Ceph version is 12.2.2 (the cluster was setup with 12.2.1, then updated to 12.2.2.2 on Debian 9). services: mon: 3 daemons, quorum ceph1,ceph2,ceph3 mgr: ceph1(active), standbys: ceph2 mds: cephfs-1/1/1 up {0=ceph1=up:active}, 2 up:standby osd: 10 osds: 10 up, 10 in data:

[ceph-users] Adding new host

2017-12-17 Thread Karun Josy
Hi, We have a live cluster with 8 OSD nodes all having 5-6 disks each. We would like to add a new host and expand the cluster. We have 4 pools - 3 replicated pools with replication factor 5 and 3 - 1 erasure coded pool with k=5, m=3 So my concern is, is there any precautions that are needed to

[ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-17 Thread shadow_lin
Hi All, I am testing luminous 12.2.2 and find a strange behavior of my cluster. I was testing my cluster throughput by using fio on a mounted rbd with follow fio parameters: fio -directory=fiotest -direct=1 -thread -rw=write -ioengine=libaio -size=200G -group_reporting -bs=1m -i

Re: [ceph-users] Adding new host

2017-12-17 Thread David Turner
I like to avoid adding disks from more than 1 failure domain at a time in case some of the new disks are bad. In your example of only adding 1 new node, I would say that adding all of the disks at the same time is the better way to do it. Adding only 1 disk in the new node at a time would actually

Re: [ceph-users] Adding new host

2017-12-17 Thread Karun Josy
Hi David, Thank you for your response. Failure domain for ec profile is 'host'. So I guess it is okay to add a node and activate 5 disks at a time ? $ ceph osd erasure-code-profile get profile5by3 crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=fals

Re: [ceph-users] RGW Logging pool

2017-12-17 Thread Ansgar Jazdzewski
Hi, it is poosible to configure the rgw logging to a unix socket, with this you are able to use a json stream. In a POC we put events into a rediscache to do async processing. sadly i cant find the needed configlines at the moment. hope it helps, Ansgar __

Re: [ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-17 Thread Denes Dolhay
Hi, This is just a tip, I do not know if this actually applies to you, but some ssds are decreasing their write throughput on purpose so they do not wear out the cells before the warranty period is over. Denes. On 12/17/2017 06:45 PM, shadow_lin wrote: Hi All, I am testing luminous 12.2.

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi John thanks for your answer. in normal condition, i can run "ceph mds fiail" before reboot. but if the host reboots by itself for some reason, i can do nothing! if this happens, data must be losed. so, is there any other way to stop data from being losed? thanks 13605702...@163.com Fr

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com <13605702...@163.com> wrote: > hi John > > thanks for your answer. > > in normal condition, i can run "ceph mds fiail" before reboot. > but if the host reboots by itself for some reason, i can do nothing! > if this happens, data must be losed. >

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan 1. run "ceph mds fail" before rebooting host 2. host reboot by itself for some reason you means no data get lost in the BOTH conditions? in my test, i echo the date string per second into the file under cephfs dir, when i reboot the master mds, there are 15 lines got lost. thanks 136

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com <13605702...@163.com> wrote: > hi Yan > > 1. run "ceph mds fail" before rebooting host > 2. host reboot by itself for some reason > cephfs client was also on the rebooted host? > you means no data get lost in the BOTH conditions? > > in my te

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan cephfs client was also on the rebooted host? NO, the cephfs client is an indepentent vm 13605702...@163.com From: Yan, Zheng Date: 2017-12-18 10:36 To: 13605702...@163.com CC: John Spray; ceph-users Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting On Mo

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com <13605702...@163.com> wrote: > hi Yan > > 1. run "ceph mds fail" before rebooting host > 2. host reboot by itself for some reason > > you means no data get lost in the BOTH conditions? > > in my test, i echo the date string per second into the

Re: [ceph-users] PG active+clean+remapped status

2017-12-17 Thread Karun Josy
Tried restarting all osds. Still no luck. Will adding a new disk to any of the server forces a rebalance and fix it? Karun Josy On Sun, Dec 17, 2017 at 12:22 PM, Cary wrote: > Karun, > > Could you paste in the output from "ceph health detail"? Which OSD > was just added? > > Cary > -Dynamic >

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Wei Jin
On Fri, Dec 15, 2017 at 6:08 PM, John Spray wrote: > On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com > <13605702...@163.com> wrote: >> hi >> >> i used 3 nodes to deploy mds (each node also has mon on it) >> >> my config: >> [mds.ceph-node-10-101-4-17] >> mds_standby_replay = true >> mds_stand

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan my test script: #!/bin/sh rm -f /root/cephfs/time.txt while true do echo `date` >> /root/cephfs/time.txt sync sleep 1 done i run this scripte and then reboot master mds from the file /root/cephfs/time.txt, i can see there are more than 15 lines got lost: Mon Dec 18 03:07:

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 11:11 AM, 13605702...@163.com <13605702...@163.com> wrote: > hi Yan > > my test script: > > #!/bin/sh > > rm -f /root/cephfs/time.txt > > while true > do > echo `date` >> /root/cephfs/time.txt > sync > sleep 1 > done > > i run this scripte and then reboot master

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan > Mon Dec 18 03:07:47 UTC 2017 <-- reboot > Mon Dec 18 03:08:05 UTC 2017 <-- mds failover works this is caused by write stall but the data below got lost, is this normal? Mon Dec 18 03:07:48 UTC 2017 Mon Dec 18 03:07:49 UTC 2017 Mon Dec 18 03:07:50 UTC 2017 Mon Dec 18 03:07:51 UTC 2017

Re: [ceph-users] PG active+clean+remapped status

2017-12-17 Thread David Turner
Maybe try outing the disk that should have a copy of the PG, but doesn't. Then mark it back in. It might check that it has everything properly and pull a copy of the data it's missing. I dunno. On Sun, Dec 17, 2017, 10:00 PM Karun Josy wrote: > Tried restarting all osds. Still no luck. > > Will

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 11:34 AM, 13605702...@163.com <13605702...@163.com> wrote: > hi Yan > >> Mon Dec 18 03:07:47 UTC 2017 <-- reboot >> Mon Dec 18 03:08:05 UTC 2017 <-- mds failover works > > this is caused by write stall > > but the data below got lost, is this normal? your script never wri

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread David Turner
The lines might not be in the file, but did the thing writing to the file say it succeeded to write or did it fail to write? I'm guessing the latter which means that just check that the write was successful and don't just assume it was before continuing on. On Sun, Dec 17, 2017, 10:07 PM Wei Jin

Re: [ceph-users] ceph directory not accessible

2017-12-17 Thread gjprabu
Hi Yan, Sorry for late reply , it is kernel client and ceph version 10.2.3. Its not reproducible in other mounts. Regards Prabu GJ On Thu, 14 Dec 2017 12:18:52 +0530 Yan, Zheng wrote On Thu, Dec 14, 2017 at 2:14 PM, gjprabu

Re: [ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-17 Thread Konstantin Shalygin
I am testing luminous 12.2.2 and find a strange behavior of my cluster. Check your block.db usage. Luminous 12.2.2 is affected http://tracker.ceph.com/issues/22264 [root@ceph-osd0]# ceph daemon osd.46 perf dump | jq '.bluefs' | grep -E '(db|slow)'   "db_total_bytes": 30064762880,   "db_used_

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan you are right, the data didn't get lost. it is caused by write stall. thanks 13605702...@163.com From: Yan, Zheng Date: 2017-12-18 12:01 To: 13605702...@163.com CC: John Spray; ceph-users Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting On Mon, Dec 18,