date:20160223

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

2016-02-23 Thread Dimitar Boichev

Hello,
Thank you Bryan.

I was just trying to upgrade to hammer or upper but before that I was wanting 
to get the cluster in Healthy state.
Do you think it is safe to upgrade now first to latest firefly then to Hammer ?


Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boic...@axsmarine.com

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Stillwell, Bryan
Sent: Tuesday, February 23, 2016 1:51 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] osd not removed from crush map after ceph osd crush 
remove

Dimitar,

I'm not sure why those PGs would be stuck in the stale+active+clean state.  
Maybe try upgrading to the 0.80.11 release to see if it's a bug that was fixed 
already?  You can use the 'ceph tell osd.* version' command after the upgrade 
to make sure all OSDs are running the new version.  Also since firefly (0.80.x) 
is near its EOL, you should consider upgrading to hammer (0.94.x).

As for why osd.4 didn't get fully removed, the last command you ran isn't 
correct.  It should be 'ceph osd rm 4'.  Trying to remember when to use the 
CRUSH name (osd.4) versus the OSD number (4) can be a pain.

Bryan

From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Dimitar Boichev 
mailto:dimitar.boic...@axsmarine.com>>
Date: Monday, February 22, 2016 at 1:10 AM
To: Dimitar Boichev 
mailto:dimitar.boic...@axsmarine.com>>, 
"ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] osd not removed from crush map after ceph osd crush 
remove

Anyone ?

Regards.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Dimitar Boichev
Sent: Thursday, February 18, 2016 5:06 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] osd not removed from crush map after ceph osd crush remove

Hello,
I am running a tiny cluster of 2 nodes.
ceph -v
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

One osd died and I added a new osd (not replacing the old one).
After that I wanted to remove the failed osd completely from the cluster.
Here is what I did:
ceph osd reweight osd.4 0.0
ceph osd crush reweight osd.4 0.0
ceph osd out osd.4
ceph osd crush remove osd.4
ceph auth del osd.4
ceph osd rm osd.4


But after the rebalancing I ended up with 155 PGs in stale+active+clean  state.

@storage1:/tmp# ceph -s
cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3
 health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are 
blocked > 32 sec; nodeep-scrub flag(s) set
 monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election epoch 1, 
quorum 0 storage1
 osdmap e1064: 6 osds: 6 up, 6 in
flags nodeep-scrub
  pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects
1209 GB used, 14210 GB / 15419 GB avail
 155 stale+active+clean
 557 active+clean
  client io 91925 B/s wr, 5 op/s

I know about the 1 monitor problem I just want to fix the cluster to healthy 
state then I will add the third storage node and go up to 3 monitors.

The problem is as follows:
@storage1:/tmp# ceph pg map 2.3a
osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6]
@storage1:/tmp# ceph pg 2.3a query
Error ENOENT: i don't have pgid 2.3a


@storage1:/tmp# ceph health detail
HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked > 32 
sec; 1 osds have slow requests; nodeep-scrub flag(s) set
pg 7.2a is stuck stale for 8887559.656879, current state stale+active+clean, 
last acting [4]
pg 5.28 is stuck stale for 8887559.656886, current state stale+active+clean, 
last acting [4]
pg 7.2b is stuck stale for 8887559.656889, current state stale+active+clean, 
last acting [4]
pg 7.2c is stuck stale for 8887559.656892, current state stale+active+clean, 
last acting [4]
pg 0.2b is stuck stale for 8887559.656893, current state stale+active+clean, 
last acting [4]
pg 6.2c is stuck stale for 8887559.656894, current state stale+active+clean, 
last acting [4]
pg 6.2f is stuck stale for 8887559.656893, current state stale+active+clean, 
last acting [4]
pg 2.2b is stuck stale for 8887559.656896, current state stale+active+clean, 
last acting [4]
pg 2.25 is stuck stale for 8887559.656896, current state stale+active+clean, 
last acting [4]
pg 6.20 is stuck stale for 8887559.656898, current state stale+active+clean, 
last acting [4]
pg 5.21 is stuck stale for 8887559.656898, current state stale+active+clean, 
last acting [4]
pg 0.24 is stuck stale for 8887559.656904, current state stale+active+clean, 
last acting [4]
pg 2.21 is stuck stale for 8887559.656904, current state stale+active+clean, 
last acting [4]
pg 5.27 is stuck stale for 8887559.656906, current state stale+active+clean, 
last acting [4]
pg 2.23 is stuck stale for 8887559.656908, current state stale+active+clean, 
last acting

[ceph-users] Why my cluster performance is so bad?

2016-02-23 Thread yang

My ceph cluster config:
7 nodes(including 3 mons, 3 mds).
9 SATA HDD in every node and each HDD as an OSD&journal(deployed by 
ceph-deploy).
CPU:  32core
Mem: 64GB
public network: 1Gbx2 bond0,
cluster network: 1Gbx2 bond0.

The read bw is 109910KB/s for 1M-read, and 34329KB/s for 1M-write.
Why is it so bad?
Anyone who can give me some suggestion?


fio jobfile:
[global]
direct=1
thread
ioengine=psync
size=10G
runtime=300
time_based
iodepth=10
group_reporting
stonewall
filename=/mnt/rbd/data

[read1M]
bs=1M
rw=read
numjobs=1
name=read1M

[write1M]
bs=1M
rw=write
numjobs=1
name=write1M

[read4k-seq]
bs=4k
rw=read
numjobs=8
name=read4k-seq

[read4k-rand]
bs=4k
rw=randread
numjobs=8
name=read4k-rand

[write4k-seq]
bs=4k
rw=write
numjobs=8
name=write4k-seq

[write4k-rand]
bs=4k
rw=randwrite
numjobs=8
name=write4k-rand


and the fio result is as follows:

read1M: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=10
write1M: (g=1): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=10
read4k-seq: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=10
...
read4k-rand: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, 
iodepth=10
...
write4k-seq: (g=4): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=10
...
write4k-rand: (g=5): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, 
iodepth=10
...
fio-2.3
Starting 34 threads
read1M: Laying out IO file(s) (1 file(s) / 10240MB)
Jobs: 8 (f=8): [_(26),w(8)] [18.8% done] [0KB/1112KB/0KB /s] [0/278/0 iops] 
[eta 02h:10m:00s] 
read1M: (groupid=0, jobs=1): err= 0: pid=17606: Tue Feb 23 14:28:45 2016
  read : io=32201MB, bw=109910KB/s, iops=107, runt=37msec
clat (msec): min=1, max=74, avg= 9.31, stdev= 2.78
 lat (msec): min=1, max=74, avg= 9.31, stdev= 2.78
clat percentiles (usec):
 |  1.00th=[ 1448],  5.00th=[ 2040], 10.00th=[ 3952], 20.00th=[ 9792],
 | 30.00th=[ 9920], 40.00th=[ 9920], 50.00th=[ 9920], 60.00th=[10048],
 | 70.00th=[10176], 80.00th=[10304], 90.00th=[10688], 95.00th=[10944],
 | 99.00th=[11968], 99.50th=[19072], 99.90th=[27008], 99.95th=[29568],
 | 99.99th=[38144]
bw (KB  /s): min=93646, max=139912, per=100.00%, avg=110022.09, 
stdev=7759.48
lat (msec) : 2=4.20%, 4=5.98%, 10=43.37%, 20=46.00%, 50=0.45%
lat (msec) : 100=0.01%
  cpu  : usr=0.05%, sys=0.81%, ctx=32209, majf=0, minf=1055
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r=32201/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=10
write1M: (groupid=1, jobs=1): err= 0: pid=23779: Tue Feb 23 14:28:45 2016
  write: io=10058MB, bw=34329KB/s, iops=33, runt=300018msec
clat (msec): min=20, max=565, avg=29.80, stdev= 8.84
 lat (msec): min=20, max=565, avg=29.83, stdev= 8.84
clat percentiles (msec):
 |  1.00th=[   22],  5.00th=[   22], 10.00th=[   23], 20.00th=[   30],
 | 30.00th=[   31], 40.00th=[   31], 50.00th=[   31], 60.00th=[   31],
 | 70.00th=[   31], 80.00th=[   32], 90.00th=[   32], 95.00th=[   33],
 | 99.00th=[   35], 99.50th=[   38], 99.90th=[  118], 99.95th=[  219],
 | 99.99th=[  322]
bw (KB  /s): min= 3842, max=40474, per=100.00%, avg=34408.82, stdev=2751.05
lat (msec) : 50=99.83%, 100=0.06%, 250=0.06%, 500=0.04%, 750=0.01%
  cpu  : usr=0.11%, sys=0.22%, ctx=10101, majf=0, minf=1050
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r=0/w=10058/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=10
read4k-seq: (groupid=2, jobs=8): err= 0: pid=27771: Tue Feb 23 14:28:45 2016
  read : io=12892MB, bw=44003KB/s, iops=11000, runt=32msec
clat (usec): min=143, max=38808, avg=725.61, stdev=457.02
 lat (usec): min=143, max=38808, avg=725.75, stdev=457.03
clat percentiles (usec):
 |  1.00th=[  270],  5.00th=[  358], 10.00th=[  398], 20.00th=[  462],
 | 30.00th=[  510], 40.00th=[  548], 50.00th=[  588], 60.00th=[  652],
 | 70.00th=[  732], 80.00th=[  876], 90.00th=[ 1176], 95.00th=[ 1576],
 | 99.00th=[ 2640], 99.50th=[ 3024], 99.90th=[ 4128], 99.95th=[ 4448],
 | 99.99th=[ 4960]
bw (KB  /s): min=  958, max=12784, per=12.51%, avg=5505.10, stdev=2094.64
lat (usec) : 250=0.27%, 500=27.64%, 750=44.00%, 1000=13.45%
lat (msec) : 2=11.65%, 4=2.88%, 10=0.12%, 20=0.01%, 50=0.01%
  cpu  : usr=0.44%, sys=1.64%, ctx=3300370, majf=0, minf=237
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=

Re: [ceph-users] Why my cluster performance is so bad?

2016-02-23 Thread Christian Balzer


Hello,

This is sort of a FAQ, google is your friend.

For example find the recent thread "Performance Testing of CEPH on ARM
MicroServer" in this ML which addresses some points pertinent to your query.
Read it, I will reference things from it below

On Tue, 23 Feb 2016 19:55:22 +0800 yang wrote:

> My ceph cluster config:
Kernel, OS, Ceph version.

> 7 nodes(including 3 mons, 3 mds).
> 9 SATA HDD in every node and each HDD as an OSD&journal(deployed by
What replication, default of 3?

That would give the theoretical IOPS of 21 HDDs, but your slow (more
precisely high latency) network and lack of SSD journals mean it will be
even lower than that.

> ceph-deploy). CPU:  32core
> Mem: 64GB
> public network: 1Gbx2 bond0,
> cluster network: 1Gbx2 bond0.
Latency in that kind of network will slow you down, especially when doing
small I/Os.

> 
As always, atop is a very nice tool to find where the bottlenecks and
hotspots are, you will have to run it preferably on all storage nodes with
nice large terminal windows to the get the most out of it, though.
 
> The read bw is 109910KB/s for 1M-read, and 34329KB/s for 1M-write.
> Why is it so bad?

Because your testing is flawed.

> Anyone who can give me some suggestion?
>
For starters to get a good baseline, do rados bench tests (see thread)
with the default block size (4MB) and 4KB size.

> 
> fio jobfile:
> [global]
> direct=1
> thread
Not sure how this affects things versus the default of fork.

> ioengine=psync
Definitely never used this, either use libaio or the rbd engine in newer
fio versions.

> size=10G
> runtime=300
> time_based
> iodepth=10
This is your main problem, Ceph/RBD does not do well with a low number of
threads.
Simply because you're likely to hit just a single OSD for a prolonged
time, thus getting more or less single disk speeds.

See more about this in the results below.

> group_reporting
> stonewall
> filename=/mnt/rbd/data

Are we to assume that this mounted via the kernel RBD module?
Where, different client node that's not part of the cluster?
Which FS?

> 
> [read1M]
> bs=1M
> rw=read
> numjobs=1
> name=read1M
> 
> [write1M]
> bs=1M
> rw=write
> numjobs=1
> name=write1M
> 
> [read4k-seq]
> bs=4k
> rw=read
> numjobs=8
> name=read4k-seq
> 
> [read4k-rand]
> bs=4k
> rw=randread
> numjobs=8
> name=read4k-rand
> 
> [write4k-seq]
> bs=4k
> rw=write
> numjobs=8
> name=write4k-seq
> 
> [write4k-rand]
> bs=4k
> rw=randwrite
> numjobs=8
> name=write4k-rand
> 
> 
> and the fio result is as follows:
> 
> read1M: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=10
> write1M: (g=1): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=psync,
> iodepth=10 read4k-seq: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K,
> ioengine=psync, iodepth=10 ...
> read4k-rand: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync,
> iodepth=10 ...
> write4k-seq: (g=4): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=psync,
> iodepth=10 ...
> write4k-rand: (g=5): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync,
> iodepth=10 ...
> fio-2.3
> Starting 34 threads
> read1M: Laying out IO file(s) (1 file(s) / 10240MB)
> Jobs: 8 (f=8): [_(26),w(8)] [18.8% done] [0KB/1112KB/0KB /s] [0/278/0
> iops] [eta 02h:10m:00s] read1M: (groupid=0, jobs=1): err= 0: pid=17606:
> Tue Feb 23 14:28:45 2016 read : io=32201MB, bw=109910KB/s, iops=107,
> runt=37msec clat (msec): min=1, max=74, avg= 9.31, stdev= 2.78
>  lat (msec): min=1, max=74, avg= 9.31, stdev= 2.78
> clat percentiles (usec):
>  |  1.00th=[ 1448],  5.00th=[ 2040], 10.00th=[ 3952],
> 20.00th=[ 9792], | 30.00th=[ 9920], 40.00th=[ 9920], 50.00th=[ 9920],
> 60.00th=[10048], | 70.00th=[10176], 80.00th=[10304], 90.00th=[10688],
> 95.00th=[10944], | 99.00th=[11968], 99.50th=[19072], 99.90th=[27008],
> 99.95th=[29568], | 99.99th=[38144]
> bw (KB  /s): min=93646, max=139912, per=100.00%, avg=110022.09,
> stdev=7759.48 lat (msec) : 2=4.20%, 4=5.98%, 10=43.37%, 20=46.00%,
> 50=0.45% lat (msec) : 100=0.01%
>   cpu  : usr=0.05%, sys=0.81%, ctx=32209, majf=0, minf=1055
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,

According to this output, the IO depth was actually 1, not 10, probably
caused by the choice of your engine or the threads option.
And this explains a LOT of your results.

Regards,

Christian
> >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >64=0.0%, >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0% issued: total=r=32201/w=0/d=0, short=r=0/w=0/d=0,
> >drop=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=10
> write1M: (groupid=1, jobs=1): err= 0: pid=23779: Tue Feb 23 14:28:45 2016
>   write: io=10058MB, bw=34329KB/s, iops=33, runt=300018msec
> clat (msec): min=20, max=565, avg=29.80, stdev= 8.84
>  lat (msec): min=20, max=565, avg=29.83, stdev= 8.84
> clat percentiles (msec):
>  |  1.00th=[   22],  5.00th=[   22], 10.00th=[   23],
> 20.00th=[   30], | 30.00th=[   31], 40.00th

Re: [ceph-users] Why my cluster performance is so bad?

2016-02-23 Thread Christian Balzer


Hello,

On Tue, 23 Feb 2016 22:49:44 +0900 Christian Balzer wrote:

[snip]
> > 7 nodes(including 3 mons, 3 mds).
> > 9 SATA HDD in every node and each HDD as an OSD&journal(deployed by
> What replication, default of 3?
> 
> That would give the theoretical IOPS of 21 HDDs, but your slow (more
> precisely high latency) network and lack of SSD journals mean it will be
> even lower than that.
> 
[snap]

And when I said 21, I meant of course 10, since w/o SSD journals you're
halving the IOPS of each OSD HDD (and that is being generous).

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Vickey Singh

Hello Guys

I am getting wired output from osd map. The object does not exists on pool
but osd map still shows its PG and OSD on which its stored.

So i have rbd device coming from pool 'gold' , this image has an object
'rb.0.10f61.238e1f29.2ac5'

The below commands verifies this

*[root@ceph-node1 ~]# rados -p gold ls | grep -i
rb.0.10f61.238e1f29.2ac5*
*rb.0.10f61.238e1f29.2ac5*
*[root@ceph-node1 ~]#*

This object lives on pool gold and OSD 38,0,20 , which is correct

*[root@ceph-node1 ~]# ceph osd map gold rb.0.10f61.238e1f29.2ac5*
*osdmap e1357 pool 'gold' (1) object 'rb.0.10f61.238e1f29.2ac5' ->
pg 1.11692600 (1.0) -> up ([38,0,20], p38) acting ([38,0,20], p38)*
*[root@ceph-node1 ~]#*


Since i don't have object 'rb.0.10f61.238e1f29.2ac5' in data and
rbd pools , rados ls will not list it. Which is expected.

*[root@ceph-node1 ~]# rados -p data ls | grep -i
rb.0.10f61.238e1f29.2ac5*
*[root@ceph-node1 ~]# rados -p rbd ls | grep -i
rb.0.10f61.238e1f29.2ac5*


But , how come the object is showing in osd map of pool data and rbd.

*[root@ceph-node1 ~]# ceph osd map data rb.0.10f61.238e1f29.2ac5*
*osdmap e1357 pool 'data' (2) object 'rb.0.10f61.238e1f29.2ac5' ->
pg 2.11692600 (2.0) -> up ([3,51,29], p3) acting ([3,51,29], p3)*
*[root@ceph-node1 ~]#*

*[root@ceph-node1 ~]# ceph osd map rbd rb.0.10f61.238e1f29.2ac5*
*osdmap e1357 pool 'rbd' (0) object 'rb.0.10f61.238e1f29.2ac5' ->
pg 0.11692600 (0.0) -> up ([41,20,3], p41) acting ([41,20,3], p41)*
*[root@ceph-node1 ~]#*


In ceph, object is unique and belongs to only one pool. So why does it
shows up in all pool's osd map.

Is this some kind of BUG in Ceph

Ceph Hammer 0.94.5
CentOS 7.2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] v0.94.6 Hammer released

2016-02-23 Thread Sage Weil

This Hammer point release fixes a range of bugs, most notably a fix for 
unbounded growth of the monitor’s leveldb store, and a workaround in the 
OSD to keep most xattrs small enough to be stored inline in XFS inodes.

We recommend that all hammer v0.94.x users upgrade.

For more detailed information, see the complete changelog:

  http://docs.ceph.com/docs/master/_downloads/v0.94.6.txt

Notable Changes
---

* build/ops: Ceph daemon failed to start, because the service name was already 
used. (#13474, Chuanhong Wang)
* build/ops: LTTng-UST tracing should be dynamically enabled (#13274, Jason 
Dillaman)
* build/ops: ceph upstart script rbdmap.conf incorrectly processes parameters 
(#13214, Sage Weil)
* build/ops: ceph.spec.in License line does not reflect COPYING (#12935, Nathan 
Cutler)
* build/ops: ceph.spec.in libcephfs_jni1 has no %post and %postun  (#12927, 
Owen Synge)
* build/ops: configure.ac: no use to add "+" before ac_ext=c (#14330, Kefu 
Chai, Robin H. Johnson)
* build/ops: deb: strip tracepoint libraries from Wheezy/Precise builds 
(#14801, Jason Dillaman)
* build/ops: init script reload doesn't work on EL7 (#13709, Hervé Rousseau)
* build/ops: init-rbdmap uses distro-specific functions (#12415, Boris Ranto)
* build/ops: logrotate reload error on Ubuntu 14.04 (#11330, Sage Weil)
* build/ops: miscellaneous spec file fixes (#12931, #12994, #12924, #12360, 
Boris Ranto, Nathan Cutler, Owen Synge, Travis Rhoden, Ken Dreyer)
* build/ops: pass tcmalloc env through to ceph-os (#14802, Sage Weil)
* build/ops: rbd-replay-* moved from ceph-test-dbg to ceph-common-dbg as well 
(#13785, Loic Dachary)
* build/ops: unknown argument --quiet in udevadm settle (#13560, Jason Dillaman)
* common: Objecter: pool op callback may hang forever. (#13642, xie xingguo)
* common: Objecter: potential null pointer access when do pool_snap_list. 
(#13639, xie xingguo)
* common: ThreadPool add/remove work queue methods not thread safe (#12662, 
Jason Dillaman)
* common: auth/cephx: large amounts of log are produced by osd (#13610, Qiankun 
Zheng)
* common: client nonce collision due to unshared pid namespaces (#13032, Josh 
Durgin)
* common: common/Thread:pthread_attr_destroy(thread_attr) when done with it 
(#12570, Piotr Dałek)
* common: log: Log.cc: Assign LOG_DEBUG priority to syslog calls (#13993, Brad 
Hubbard)
* common: objecter: cancellation bugs (#13071, Jianpeng Ma)
* common: pure virtual method called (#13636, Jason Dillaman)
* common: small probability sigabrt when setting rados_osd_op_timeout (#13208, 
Ruifeng Yang)
* common: wrong conditional for boolean function KeyServer::get_auth() (#9756, 
#13424, Nathan Cutler)
* crush: crash if we see CRUSH_ITEM_NONE in early rule step (#13477, Sage Weil)
* doc: man: document listwatchers cmd in "rados" manpage (#14556, Kefu Chai)
* doc: regenerate man pages, add orphans commands to radosgw-admin(8) (#14637, 
Ken Dreyer)
* fs: CephFS restriction on removing cache tiers is overly strict (#11504, John 
Spray)
* fs: fsstress.sh fails (#12710, Yan, Zheng)
* librados: LibRadosWatchNotify.WatchNotify2Timeout (#13114, Sage Weil)
* librbd: ImageWatcher shouldn't block the notification thread (#14373, Jason 
Dillaman)
* librbd: diff_iterate needs to handle holes in parent images (#12885, Jason 
Dillaman)
* librbd: fix merge-diff for >2GB diff-files (#14030, Jason Dillaman)
* librbd: invalidate object map on error even w/o holding lock (#13372, Jason 
Dillaman)
* librbd: reads larger than cache size hang (#13164, Lu Shi)
* mds: ceph mds add_data_pool check for EC pool is wrong (#12426, John Spray)
* mon: MonitorDBStore: get_next_key() only if prefix matches (#11786, Joao 
Eduardo Luis)
* mon: OSDMonitor: do not assume a session exists in send_incremental() 
(#14236, Joao Eduardo Luis)
* mon: check for store writeablility before participating in election (#13089, 
Sage Weil)
* mon: compact full epochs also (#14537, Kefu Chai)
* mon: include min_last_epoch_clean as part of PGMap::print_summary and 
PGMap::dump (#13198, Guang Yang)
* mon: map_cache can become inaccurate if osd does not receive the osdmaps 
(#10930, Kefu Chai)
* mon: should not set isvalid = true when cephx_verify_authorizer return false 
(#13525, Ruifeng Yang)
* osd: Ceph Pools' MAX AVAIL is 0 if some OSDs' weight is 0 (#13840, Chengyuan 
Li)
* osd: FileStore calls syncfs(2) even it is not supported (#12512, Kefu Chai)
* osd: FileStore: potential memory leak if getattrs fails. (#13597, xie xingguo)
* osd: IO error on kvm/rbd with an erasure coded pool tier (#12012, Kefu Chai)
* osd: OSD::build_past_intervals_parallel() shall reset primary and up_primary 
when begin a new past_interval. (#13471, xiexingguo)
* osd: ReplicatedBackend: populate recovery_info.size for clone (bug symptom is 
size mismatch on replicated backend on a clone in scrub) (#12828, Samuel Just)
* osd: ReplicatedPG: wrong result code checking logic during sparse_read 
(#14151, xie xingguo)
* osd: ReplicatedPG::hit_set_trim osd/ReplicatedP

Re: [ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Gregory Farnum

This is not a bug. The map command just says which PG/OSD an object maps
to; it does not go out and query the osd to see if there actually is such
an object.
-Greg

On Tuesday, February 23, 2016, Vickey Singh 
wrote:

> Hello Guys
>
> I am getting wired output from osd map. The object does not exists on pool
> but osd map still shows its PG and OSD on which its stored.
>
> So i have rbd device coming from pool 'gold' , this image has an object
> 'rb.0.10f61.238e1f29.2ac5'
>
> The below commands verifies this
>
> *[root@ceph-node1 ~]# rados -p gold ls | grep -i
> rb.0.10f61.238e1f29.2ac5*
> *rb.0.10f61.238e1f29.2ac5*
> *[root@ceph-node1 ~]#*
>
> This object lives on pool gold and OSD 38,0,20 , which is correct
>
> *[root@ceph-node1 ~]# ceph osd map gold rb.0.10f61.238e1f29.2ac5*
> *osdmap e1357 pool 'gold' (1) object 'rb.0.10f61.238e1f29.2ac5' ->
> pg 1.11692600 (1.0) -> up ([38,0,20], p38) acting ([38,0,20], p38)*
> *[root@ceph-node1 ~]#*
>
>
> Since i don't have object 'rb.0.10f61.238e1f29.2ac5' in data and
> rbd pools , rados ls will not list it. Which is expected.
>
> *[root@ceph-node1 ~]# rados -p data ls | grep -i
> rb.0.10f61.238e1f29.2ac5*
> *[root@ceph-node1 ~]# rados -p rbd ls | grep -i
> rb.0.10f61.238e1f29.2ac5*
>
>
> But , how come the object is showing in osd map of pool data and rbd.
>
> *[root@ceph-node1 ~]# ceph osd map data rb.0.10f61.238e1f29.2ac5*
> *osdmap e1357 pool 'data' (2) object 'rb.0.10f61.238e1f29.2ac5' ->
> pg 2.11692600 (2.0) -> up ([3,51,29], p3) acting ([3,51,29], p3)*
> *[root@ceph-node1 ~]#*
>
> *[root@ceph-node1 ~]# ceph osd map rbd rb.0.10f61.238e1f29.2ac5*
> *osdmap e1357 pool 'rbd' (0) object 'rb.0.10f61.238e1f29.2ac5' ->
> pg 0.11692600 (0.0) -> up ([41,20,3], p41) acting ([41,20,3], p41)*
> *[root@ceph-node1 ~]#*
>
>
> In ceph, object is unique and belongs to only one pool. So why does it
> shows up in all pool's osd map.
>
> Is this some kind of BUG in Ceph
>
> Ceph Hammer 0.94.5
> CentOS 7.2
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

2016-02-23 Thread Stillwell, Bryan

Dimitar,

I would agree with you that getting the cluster into a healthy state first
is probably the better idea.  Based on your pg query, it appears like
you're using only 1 replica.  Any ideas why that would be?

The output should look like this (with 3 replicas):

osdmap e133481 pg 11.1b8 (11.1b8) -> up [13,58,37] acting [13,58,37]

Bryan

From:  Dimitar Boichev 
Date:  Tuesday, February 23, 2016 at 1:08 AM
To:  CTG User , "ceph-users@lists.ceph.com"

Subject:  RE: [ceph-users] osd not removed from crush map after ceph osd
crush remove


>Hello,
>Thank you Bryan.
>
>I was just trying to upgrade to hammer or upper but before that I was
>wanting to get the cluster in Healthy state.
>Do you think it is safe to upgrade now first to latest firefly then to
>Hammer ?
>
>
>Regards.
>
>Dimitar Boichev
>SysAdmin Team Lead
>AXSMarine Sofia
>Phone: +359 889 22 55 42
>Skype: dimitar.boichev.axsmarine
>E-mail:
>dimitar.boic...@axsmarine.com
>
>
>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]
>On Behalf Of Stillwell, Bryan
>Sent: Tuesday, February 23, 2016 1:51 AM
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] osd not removed from crush map after ceph osd
>crush remove
>
>
>
>Dimitar,
>
>
>
>I'm not sure why those PGs would be stuck in the stale+active+clean
>state.  Maybe try upgrading to the 0.80.11 release to see if it's a bug
>that was fixed already?  You can use the 'ceph tell osd.*
> version' command after the upgrade to make sure all OSDs are running the
>new version.  Also since firefly (0.80.x) is near its EOL, you should
>consider upgrading to hammer (0.94.x).
>
>
>
>As for why osd.4 didn't get fully removed, the last command you ran isn't
>correct.  It should be 'ceph osd rm 4'.  Trying to remember when to use
>the CRUSH name (osd.4) versus the OSD number (4)
> can be a pain.
>
>
>
>Bryan
>
>
>
>From: ceph-users  on behalf of Dimitar
>Boichev 
>Date: Monday, February 22, 2016 at 1:10 AM
>To: Dimitar Boichev ,
>"ceph-users@lists.ceph.com" 
>Subject: Re: [ceph-users] osd not removed from crush map after ceph osd
>crush remove
>
>
>
>>Anyone ?
>>
>>Regards.
>>
>>
>>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]
>>On Behalf Of Dimitar Boichev
>>Sent: Thursday, February 18, 2016 5:06 PM
>>To: ceph-users@lists.ceph.com
>>Subject: [ceph-users] osd not removed from crush map after ceph osd
>>crush remove
>>
>>
>>
>>Hello,
>>I am running a tiny cluster of 2 nodes.
>>ceph -v
>>ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>
>>One osd died and I added a new osd (not replacing the old one).
>>After that I wanted to remove the failed osd completely from the cluster.
>>Here is what I did:
>>ceph osd reweight osd.4 0.0
>>ceph osd crush reweight osd.4 0.0
>>ceph osd out osd.4
>>ceph osd crush remove osd.4
>>ceph auth del osd.4
>>ceph osd rm osd.4
>>
>>
>>But after the rebalancing I ended up with 155 PGs in stale+active+clean
>>state.
>>
>>@storage1:/tmp# ceph -s
>>cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3
>> health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests
>>are blocked > 32 sec; nodeep-scrub flag(s) set
>> monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election epoch
>>1, quorum 0 storage1
>> osdmap e1064: 6 osds: 6 up, 6 in
>>flags nodeep-scrub
>>  pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects
>>1209 GB used, 14210 GB / 15419 GB avail
>> 155 stale+active+clean
>> 557 active+clean
>>  client io 91925 B/s wr, 5 op/s
>>
>>I know about the 1 monitor problem I just want to fix the cluster to
>>healthy state then I will add the third storage node and go up to 3
>>monitors.
>>
>>The problem is as follows:
>>@storage1:/tmp# ceph pg map 2.3a
>>osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6]
>>@storage1:/tmp# ceph pg 2.3a query
>>Error ENOENT: i don't have pgid 2.3a
>>
>>
>>@storage1:/tmp# ceph health detail
>>HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked >
>>32 sec; 1 osds have slow requests; nodeep-scrub flag(s) set
>>pg 7.2a is stuck stale for 8887559.656879, current state
>>stale+active+clean, last acting [4]
>>pg 5.28 is stuck stale for 8887559.656886, current state
>>stale+active+clean, last acting [4]
>>pg 7.2b is stuck stale for 8887559.656889, current state
>>stale+active+clean, last acting [4]
>>pg 7.2c is stuck stale for 8887559.656892, current state
>>stale+active+clean, last acting [4]
>>pg 0.2b is stuck stale for 8887559.656893, current state
>>stale+active+clean, last acting [4]
>>pg 6.2c is stuck stale for 8887559.656894, current state
>>stale+active+clean, last acting [4]
>>pg 6.2f is stuck stale for 8887559.656893, current state
>>stale+active+clean, last acting [4]
>>pg 2.2b is stuck stale for 8887559.656896, current state
>>stale+active+clean, last acting [4]
>>pg 2.25 is stuck stale for 8887559.656896, current state
>>stale+active+clean, last acting [4]
>>pg 6.20 is stuck stale for 8887559.656898, current state
>>st

[ceph-users] Old MDS resurrected after update

2016-02-23 Thread Scottix

I had a weird thing happen when I was testing an upgrade in a dev
environment where I have removed an MDS from a machine a while back.

I upgraded to 0.94.6 and low and behold the mds daemon started up on the
machine again. I know the /var/lib/ceph/mds folder was removed becaues I
renamed it /var/lib/ceph/mds-removed and I definitely have restarted this
machine several times with mds not starting before.

Only thing I noticed was the auth keys were still in play. I am assuming
the upgrade recreated the folder and found it still had access so it
started back up.

I am guessing we have to add one more step in the removal mds from this
post
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-January/045649.html

 1 Stop the old MDS
 2 Run "ceph mds fail 0"
 3 Run "ceph auth del mds."
I am a little weary of command 2 since there is no clear depiction of what
0 is. Is this command better since it is more clear "ceph mds rm 0
mds."

Is there anything else that could possibly resurrect it?

Best,
Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] librados: how to get notified when a certain object is created

2016-02-23 Thread Gregory Farnum

On Saturday, February 20, 2016, Sorin Manolache  wrote:

> Hello,
>
> I can set a watch on an object in librados. Does this object have to exist
> already at the moment I'm setting the watch on it? What happens if the
> object does not exist? Is my watcher valid? Will I get notified when
> someone else creates the missing object that I'm watching and sends a
> notification?


I believe a watch implicitly creates the object, but you could run it on a
non-existent object and check. ;) but...


>
> If the watch is not valid if the object has not yet been created then how
> can I get notified when the object is created? (I can imagine a
> work-around: there's an additional object, a kind of object registry object
> (the equivalent of a directory in a file system), that contains the list of
> created objects. I'm watching for modifications of the object registry
> object. Whenever a new object is created, the agent that creates the object
> also updates the object registry object.)


This is probably a better idea. Watches have a bit of overhead and while
you can have a few per client without much trouble; if you're trying to
watch a whole bunch of potential objects you probably want a stronger side
channel communications system anyway, so using a set of communication
objects and pushing the relevant data through them is likely to work better.
-Greg



>
> Thank you,
> Sorin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Vickey Singh

Thanks Greg,

Do you mean ceph osd map command is not displaying accurate information ?

I guess, either of these things are happening with my cluster
- ceph osd map is not printing true information
- Object to PG mapping is not correct ( one object is mapped to multiple
PG's )

This is happening for several objects , but the cluster is Healthy.

Need expert suggestion.


On Tue, Feb 23, 2016 at 7:20 PM, Gregory Farnum  wrote:

> This is not a bug. The map command just says which PG/OSD an object maps
> to; it does not go out and query the osd to see if there actually is such
> an object.
> -Greg
>
>
> On Tuesday, February 23, 2016, Vickey Singh 
> wrote:
>
>> Hello Guys
>>
>> I am getting wired output from osd map. The object does not exists on
>> pool but osd map still shows its PG and OSD on which its stored.
>>
>> So i have rbd device coming from pool 'gold' , this image has an object
>> 'rb.0.10f61.238e1f29.2ac5'
>>
>> The below commands verifies this
>>
>> *[root@ceph-node1 ~]# rados -p gold ls | grep -i
>> rb.0.10f61.238e1f29.2ac5*
>> *rb.0.10f61.238e1f29.2ac5*
>> *[root@ceph-node1 ~]#*
>>
>> This object lives on pool gold and OSD 38,0,20 , which is correct
>>
>> *[root@ceph-node1 ~]# ceph osd map gold rb.0.10f61.238e1f29.2ac5*
>> *osdmap e1357 pool 'gold' (1) object 'rb.0.10f61.238e1f29.2ac5'
>> -> pg 1.11692600 (1.0) -> up ([38,0,20], p38) acting ([38,0,20], p38)*
>> *[root@ceph-node1 ~]#*
>>
>>
>> Since i don't have object 'rb.0.10f61.238e1f29.2ac5' in data and
>> rbd pools , rados ls will not list it. Which is expected.
>>
>> *[root@ceph-node1 ~]# rados -p data ls | grep -i
>> rb.0.10f61.238e1f29.2ac5*
>> *[root@ceph-node1 ~]# rados -p rbd ls | grep -i
>> rb.0.10f61.238e1f29.2ac5*
>>
>>
>> But , how come the object is showing in osd map of pool data and rbd.
>>
>> *[root@ceph-node1 ~]# ceph osd map data rb.0.10f61.238e1f29.2ac5*
>> *osdmap e1357 pool 'data' (2) object 'rb.0.10f61.238e1f29.2ac5'
>> -> pg 2.11692600 (2.0) -> up ([3,51,29], p3) acting ([3,51,29], p3)*
>> *[root@ceph-node1 ~]#*
>>
>> *[root@ceph-node1 ~]# ceph osd map rbd rb.0.10f61.238e1f29.2ac5*
>> *osdmap e1357 pool 'rbd' (0) object 'rb.0.10f61.238e1f29.2ac5' ->
>> pg 0.11692600 (0.0) -> up ([41,20,3], p41) acting ([41,20,3], p41)*
>> *[root@ceph-node1 ~]#*
>>
>>
>> In ceph, object is unique and belongs to only one pool. So why does it
>> shows up in all pool's osd map.
>>
>> Is this some kind of BUG in Ceph
>>
>> Ceph Hammer 0.94.5
>> CentOS 7.2
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Gregory Farnum

On Tuesday, February 23, 2016, Vickey Singh 
wrote:

> Thanks Greg,
>
> Do you mean ceph osd map command is not displaying accurate information ?
>
> I guess, either of these things are happening with my cluster
> - ceph osd map is not printing true information
> - Object to PG mapping is not correct ( one object is mapped to multiple
> PG's )
>
> This is happening for several objects , but the cluster is Healthy.
>

No, you're looking for the map command to do something it was not designed
for. If you want to see if an object exists, you will need to use a RADOS
client to fetch the object and see if it's there. "map" is a mapping
command: given an object name, which PG/OSD does CRUSH map that name to?



>
> Need expert suggestion.
>
>
> On Tue, Feb 23, 2016 at 7:20 PM, Gregory Farnum  > wrote:
>
>> This is not a bug. The map command just says which PG/OSD an object maps
>> to; it does not go out and query the osd to see if there actually is such
>> an object.
>> -Greg
>>
>>
>> On Tuesday, February 23, 2016, Vickey Singh > > wrote:
>>
>>> Hello Guys
>>>
>>> I am getting wired output from osd map. The object does not exists on
>>> pool but osd map still shows its PG and OSD on which its stored.
>>>
>>> So i have rbd device coming from pool 'gold' , this image has an object
>>> 'rb.0.10f61.238e1f29.2ac5'
>>>
>>> The below commands verifies this
>>>
>>> *[root@ceph-node1 ~]# rados -p gold ls | grep -i
>>> rb.0.10f61.238e1f29.2ac5*
>>> *rb.0.10f61.238e1f29.2ac5*
>>> *[root@ceph-node1 ~]#*
>>>
>>> This object lives on pool gold and OSD 38,0,20 , which is correct
>>>
>>> *[root@ceph-node1 ~]# ceph osd map gold rb.0.10f61.238e1f29.2ac5*
>>> *osdmap e1357 pool 'gold' (1) object 'rb.0.10f61.238e1f29.2ac5'
>>> -> pg 1.11692600 (1.0) -> up ([38,0,20], p38) acting ([38,0,20], p38)*
>>> *[root@ceph-node1 ~]#*
>>>
>>>
>>> Since i don't have object 'rb.0.10f61.238e1f29.2ac5' in data and
>>> rbd pools , rados ls will not list it. Which is expected.
>>>
>>> *[root@ceph-node1 ~]# rados -p data ls | grep -i
>>> rb.0.10f61.238e1f29.2ac5*
>>> *[root@ceph-node1 ~]# rados -p rbd ls | grep -i
>>> rb.0.10f61.238e1f29.2ac5*
>>>
>>>
>>> But , how come the object is showing in osd map of pool data and rbd.
>>>
>>> *[root@ceph-node1 ~]# ceph osd map data rb.0.10f61.238e1f29.2ac5*
>>> *osdmap e1357 pool 'data' (2) object 'rb.0.10f61.238e1f29.2ac5'
>>> -> pg 2.11692600 (2.0) -> up ([3,51,29], p3) acting ([3,51,29], p3)*
>>> *[root@ceph-node1 ~]#*
>>>
>>> *[root@ceph-node1 ~]# ceph osd map rbd rb.0.10f61.238e1f29.2ac5*
>>> *osdmap e1357 pool 'rbd' (0) object 'rb.0.10f61.238e1f29.2ac5'
>>> -> pg 0.11692600 (0.0) -> up ([41,20,3], p41) acting ([41,20,3], p41)*
>>> *[root@ceph-node1 ~]#*
>>>
>>>
>>> In ceph, object is unique and belongs to only one pool. So why does it
>>> shows up in all pool's osd map.
>>>
>>> Is this some kind of BUG in Ceph
>>>
>>> Ceph Hammer 0.94.5
>>> CentOS 7.2
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rack weight imbalance

2016-02-23 Thread George Mihaiescu

Thank you Greg, much appreciated.

I'll test with the crush tool to see if it complains about this new layout.

George

On Mon, Feb 22, 2016 at 3:19 PM, Gregory Farnum  wrote:

> On Mon, Feb 22, 2016 at 9:29 AM, George Mihaiescu 
> wrote:
> > Hi,
> >
> > We have a fairly large Ceph cluster (3.2 PB) that we want to expand and
> we
> > would like to get your input on this.
> >
> > The current cluster has around 700 OSDs (4 TB and 6 TB) in three racks
> with
> > the largest pool being rgw and using a replica 3.
> > For non-technical reasons (budgetary, etc) we are considering getting
> three
> > more racks, but initially adding only two storage nodes with 36 x 8 TB
> > drives in each, which will basically cause the rack weights to be
> imbalanced
> > (three racks with weight around a 1000 and 288 OSDs, and three racks with
> > weight around 500 but only 72 OSDs)
> >
> > The one replica per rack CRUSH rule will cause existing data to be
> > re-balanced among all six racks, with OSDs in the new racks getting only
> a
> > proportionate amount of replicas.
> >
> > Do you see any possible problems with this approach? Should Ceph be able
> to
> > properly rebalance the existing data among racks with imbalanced weights?
> >
> > Thank you for your input and please let me know if you need additional
> info.
>
> This should be okay; you have multiple racks in each size and aren't
> trying to replicate a full copy to each rack individually. You can
> test it ahead of time with the crush tool, though:
> http://docs.ceph.com/docs/master/man/8/crushtool/
> It may turn out you're using old tunables and want to update them
> first or something.
> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] xfs corruption

2016-02-23 Thread fangchen sun

Dear all:

I have a ceph object storage cluster with 143 osd and 7 radosgw, and choose
XFS as the underlying file system.
I recently ran into a problem that sometimes a osd is marked down when the
returned value of the function "chain_setxattr()" is -117. I only umount
the disk and repair it with "xfs_repair".

os: centos 6.5
kernel version: 2.6.32

the log for dmesg command:
[41796028.532225] Pid: 1438740, comm: ceph-osd Not tainted
2.6.32-925.431.23.3.letv.el6.x86_64 #1
[41796028.532227] Call Trace:
[41796028.532255]  [] ? xfs_error_report+0x3f/0x50 [xfs]
[41796028.532276]  [] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[41796028.532296]  [] ? xfs_corruption_error+0x5e/0x90
[xfs]
[41796028.532316]  [] ? xfs_da_do_buf+0x6cc/0x770 [xfs]
[41796028.532335]  [] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[41796028.532359]  [] ? kmem_zone_alloc+0x77/0xf0 [xfs]
[41796028.532380]  [] ? xfs_da_read_buf+0x2a/0x30 [xfs]
[41796028.532399]  [] ? xfs_attr_leaf_addname+0x61/0x3d0
[xfs]
[41796028.532426]  [] ? xfs_attr_leaf_addname+0x61/0x3d0
[xfs]
[41796028.532455]  [] ? xfs_trans_add_item+0x57/0x70 [xfs]
[41796028.532476]  [] ? xfs_bmbt_get_all+0x18/0x20 [xfs]
[41796028.532495]  [] ? xfs_attr_set_int+0x3c4/0x510 [xfs]
[41796028.532517]  [] ? xfs_da_do_buf+0x6db/0x770 [xfs]
[41796028.532536]  [] ? xfs_attr_set+0x81/0x90 [xfs]
[41796028.532560]  [] ? __xfs_xattr_set+0x43/0x60 [xfs]
[41796028.532584]  [] ? xfs_xattr_user_set+0x11/0x20 [xfs]
[41796028.532592]  [] ? generic_setxattr+0xa2/0xb0
[41796028.532596]  [] ? __vfs_setxattr_noperm+0x4e/0x160
[41796028.532600]  [] ? inode_permission+0xa7/0x100
[41796028.532604]  [] ? vfs_setxattr+0xbc/0xc0
[41796028.532607]  [] ? setxattr+0xd0/0x150
[41796028.532612]  [] ? __dequeue_entity+0x30/0x50
[41796028.532617]  [] ? __switch_to+0x26e/0x320
[41796028.532621]  [] ? __sb_start_write+0x80/0x120
[41796028.532626]  [] ? thread_return+0x4e/0x760
[41796028.532630]  [] ? sys_fsetxattr+0xad/0xd0
[41796028.532633]  [] ? system_call_fastpath+0x16/0x1b
[41796028.532636] XFS (sdi1): Corruption detected. Unmount and run
xfs_repair

Any comments will be much appreciated!

Best Regards!
sunspot
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and its failures

2016-02-23 Thread Nmz

>> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>> 
>> Ceph contains
>>  MON: 3
>>  OSD: 3
>>
> For completeness sake, the OSDs are on 3 different hosts, right?

It is single machine. I`m doing tests only.

>> File system: ZFS
> That is the odd one out, very few people I'm aware of use it, support for
> it is marginal at best.
> And some of its features may of course obscure things.

I`m using ZFS on linux for a log time and I`m happy with it.


> Exact specification please, as in how is ZFS configured (single disk,
> raid-z, etc)?

2 disks in mirror mode.

>> Kernel: 4.2.6
>> 
> While probably not related, I vaguely remember 4.3 being recommended for
> use with Ceph.

At this time I can run only this kernel. But IF I decide to use Ceph (only if 
Ceph satisfy requirements) I can use any other kernel.

>> 3. Does Ceph have auto heal option?
> No. 
> And neither is the repair function a good idea w/o checking the data on
> disk first.
> This is my biggest pet peeve with Ceph and you will find it mentioned
> frequently in this ML, just a few days ago this thread for example:
> "pg repair behavior? (Was: Re: getting rid of misplaced objects)"

It is very strange to recovery data manually without know which data is good.
If I have 3 copies of data and 2 of them are corrupted then I cat recovery the 
bad one.


--

Did some new test. Now new 3 OSD are in different systems. FS is ext3

Same start as before.

# grep "a" * -R
Binary file 
osd/nmz-5/current/17.17_head/rbd\udata.1bef77ac761fb.0001__head_FB98F317__11
 matches
Binary file osd/nmz-5-journal/journal matches

# ceph pg dump | grep 17.17
dumped all in format plain
17.17   1   0   0   0   0   40961   1   
active+clean2016-02-23 16:14:32.234638  291'1   309:44  [5,4,3] 5   
[5,4,3] 5   0'0 2016-02-22 20:30:04.255301  0'0 2016-02-22 
20:30:04.255301

# md5sum rbd\\udata.1bef77ac761fb.0001__head_FB98F317__11 
\c2642965410d118c7fe40589a34d2463  
rbd\\udata.1bef77ac761fb.0001__head_FB98F317__11

# sed -i -r 's/aa/ab/g' 
rbd\\udata.1bef77ac761fb.0001__head_FB98F317__11


# ceph pg deep-scrub 17.17

7fbd99e6c700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub starts
7fbd97667700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub ok

-- restartind OSD.5

# ceph pg deep-scrub 17.17

7f00f40b8700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub starts
7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 shard 5: soid 
17/fb98f317/rbd_data.1bef77ac761fb.0001/head data_digest 0x389d90f6 
!= known data_digest 0x4f18a4a5 from auth shard 3, missing attr _, missing attr 
snapset
7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 deep-scrub 0 missing, 1 
inconsistent objects
7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 deep-scrub 1 errors


Ceph 9.2.0 bug ?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

2016-02-23 Thread Karan Singh

Dimitar

Is it fixed ?

- is your cluster pool size is 2
- you can consider running ceph pg repair {pgid}  or ceph osd lost 4 ( this is 
a bit dangerous command )


Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


> On 22 Feb 2016, at 10:10, Dimitar Boichev  
> wrote:
> 
> Anyone ?
>  
> Regards.
>  
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Dimitar Boichev
> Sent: Thursday, February 18, 2016 5:06 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] osd not removed from crush map after ceph osd crush 
> remove
>  
> Hello,
> I am running a tiny cluster of 2 nodes.
> ceph -v
> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>  
> One osd died and I added a new osd (not replacing the old one).
> After that I wanted to remove the failed osd completely from the cluster.
> Here is what I did:
> ceph osd reweight osd.4 0.0
> ceph osd crush reweight osd.4 0.0
> ceph osd out osd.4
> ceph osd crush remove osd.4
> ceph auth del osd.4
> ceph osd rm osd.4
>  
>  
> But after the rebalancing I ended up with 155 PGs in stale+active+clean  
> state.
>  
> @storage1:/tmp# ceph -s
> cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3
>  health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are 
> blocked > 32 sec; nodeep-scrub flag(s) set
>  monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election epoch 1, 
> quorum 0 storage1
>  osdmap e1064: 6 osds: 6 up, 6 in
> flags nodeep-scrub
>   pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects
> 1209 GB used, 14210 GB / 15419 GB avail
>  155 stale+active+clean
>  557 active+clean
>   client io 91925 B/s wr, 5 op/s
>  
> I know about the 1 monitor problem I just want to fix the cluster to healthy 
> state then I will add the third storage node and go up to 3 monitors.
>  
> The problem is as follows:
> @storage1:/tmp# ceph pg map 2.3a
> osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6]
> @storage1:/tmp# ceph pg 2.3a query
> Error ENOENT: i don't have pgid 2.3a
>  
>  
> @storage1:/tmp# ceph health detail
> HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked > 32 
> sec; 1 osds have slow requests; nodeep-scrub flag(s) set
> pg 7.2a is stuck stale for 8887559.656879, current state stale+active+clean, 
> last acting [4]
> pg 5.28 is stuck stale for 8887559.656886, current state stale+active+clean, 
> last acting [4]
> pg 7.2b is stuck stale for 8887559.656889, current state stale+active+clean, 
> last acting [4]
> pg 7.2c is stuck stale for 8887559.656892, current state stale+active+clean, 
> last acting [4]
> pg 0.2b is stuck stale for 8887559.656893, current state stale+active+clean, 
> last acting [4]
> pg 6.2c is stuck stale for 8887559.656894, current state stale+active+clean, 
> last acting [4]
> pg 6.2f is stuck stale for 8887559.656893, current state stale+active+clean, 
> last acting [4]
> pg 2.2b is stuck stale for 8887559.656896, current state stale+active+clean, 
> last acting [4]
> pg 2.25 is stuck stale for 8887559.656896, current state stale+active+clean, 
> last acting [4]
> pg 6.20 is stuck stale for 8887559.656898, current state stale+active+clean, 
> last acting [4]
> pg 5.21 is stuck stale for 8887559.656898, current state stale+active+clean, 
> last acting [4]
> pg 0.24 is stuck stale for 8887559.656904, current state stale+active+clean, 
> last acting [4]
> pg 2.21 is stuck stale for 8887559.656904, current state stale+active+clean, 
> last acting [4]
> pg 5.27 is stuck stale for 8887559.656906, current state stale+active+clean, 
> last acting [4]
> pg 2.23 is stuck stale for 8887559.656908, current state stale+active+clean, 
> last acting [4]
> pg 6.26 is stuck stale for 8887559.656909, current state stale+active+clean, 
> last acting [4]
> pg 7.27 is stuck stale for 8887559.656913, current state stale+active+clean, 
> last acting [4]
> pg 7.18 is stuck stale for 8887559.656914, current state stale+active+clean, 
> last acting [4]
> pg 0.1e is stuck stale for 8887559.656914, current state stale+active+clean, 
> last acting [4]
> pg 6.18 is stuck stale for 8887559.656919, current state stale+active+clean, 
> last acting [4]
> pg 2.1f is stuck stale for 8887559.656919, current state stale+active+clean, 
> last acting [4]
> pg 7.1b is stuck stale for 8887559.656922, current state stale+active+clean, 
> last acting [4]
> pg 0.1b is stuck stale for 8887559.656919, current state stale+active+clean, 
> last acting [4]
> pg 6.1d is stuck stale for 8887559.656925, current state stale+active+clean, 
> last acting [4]
> pg 2.18 is stuck stale for 8887559.656920, current state stale+active+clean, 
> last acting

Re: [ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Vickey Singh

Adding community for further help on this.

On Tue, Feb 23, 2016 at 10:57 PM, Vickey Singh 
wrote:

>
>
> On Tue, Feb 23, 2016 at 9:53 PM, Gregory Farnum 
> wrote:
>
>>
>>
>> On Tuesday, February 23, 2016, Vickey Singh 
>> wrote:
>>
>>> Thanks Greg,
>>>
>>> Do you mean ceph osd map command is not displaying accurate information ?
>>>
>>> I guess, either of these things are happening with my cluster
>>> - ceph osd map is not printing true information
>>> - Object to PG mapping is not correct ( one object is mapped to multiple
>>> PG's )
>>>
>>> This is happening for several objects , but the cluster is Healthy.
>>>
>>
>> No, you're looking for the map command to do something it was not
>> designed for. If you want to see if an object exists, you will need to use
>> a RADOS client to fetch the object and see if it's there. "map" is a
>> mapping command: given an object name, which PG/OSD does CRUSH map that
>> name to?
>>
>
> well your 6th sense is amazing :)
>
> This is exactly i want to achieve , i wan to see my PG/OSD mapping for
> objects. ( basically i have changed my crush hierarchy , now i want to
> verify that no 2 objects should go to a single host / chassis / rack ) so
> to verify them i was using ceph osd map command.
>
> Is there a smarter way to achieve this ?
>
>
>
>
>
>>
>>
>>>
>>> Need expert suggestion.
>>>
>>>
>>> On Tue, Feb 23, 2016 at 7:20 PM, Gregory Farnum 
>>> wrote:
>>>
 This is not a bug. The map command just says which PG/OSD an object
 maps to; it does not go out and query the osd to see if there actually is
 such an object.
 -Greg


 On Tuesday, February 23, 2016, Vickey Singh <
 vickey.singh22...@gmail.com> wrote:

> Hello Guys
>
> I am getting wired output from osd map. The object does not exists on
> pool but osd map still shows its PG and OSD on which its stored.
>
> So i have rbd device coming from pool 'gold' , this image has an
> object 'rb.0.10f61.238e1f29.2ac5'
>
> The below commands verifies this
>
> *[root@ceph-node1 ~]# rados -p gold ls | grep -i
> rb.0.10f61.238e1f29.2ac5*
> *rb.0.10f61.238e1f29.2ac5*
> *[root@ceph-node1 ~]#*
>
> This object lives on pool gold and OSD 38,0,20 , which is correct
>
> *[root@ceph-node1 ~]# ceph osd map gold
> rb.0.10f61.238e1f29.2ac5*
> *osdmap e1357 pool 'gold' (1) object
> 'rb.0.10f61.238e1f29.2ac5' -> pg 1.11692600 (1.0) -> up 
> ([38,0,20],
> p38) acting ([38,0,20], p38)*
> *[root@ceph-node1 ~]#*
>
>
> Since i don't have object 'rb.0.10f61.238e1f29.2ac5' in data
> and rbd pools , rados ls will not list it. Which is expected.
>
> *[root@ceph-node1 ~]# rados -p data ls | grep -i
> rb.0.10f61.238e1f29.2ac5*
> *[root@ceph-node1 ~]# rados -p rbd ls | grep -i
> rb.0.10f61.238e1f29.2ac5*
>
>
> But , how come the object is showing in osd map of pool data and rbd.
>
> *[root@ceph-node1 ~]# ceph osd map data
> rb.0.10f61.238e1f29.2ac5*
> *osdmap e1357 pool 'data' (2) object
> 'rb.0.10f61.238e1f29.2ac5' -> pg 2.11692600 (2.0) -> up 
> ([3,51,29],
> p3) acting ([3,51,29], p3)*
> *[root@ceph-node1 ~]#*
>
> *[root@ceph-node1 ~]# ceph osd map rbd
> rb.0.10f61.238e1f29.2ac5*
> *osdmap e1357 pool 'rbd' (0) object 'rb.0.10f61.238e1f29.2ac5'
> -> pg 0.11692600 (0.0) -> up ([41,20,3], p41) acting ([41,20,3], p41)*
> *[root@ceph-node1 ~]#*
>
>
> In ceph, object is unique and belongs to only one pool. So why does it
> shows up in all pool's osd map.
>
> Is this some kind of BUG in Ceph
>
> Ceph Hammer 0.94.5
> CentOS 7.2
>

>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to properly deal with NEAR FULL OSD

2016-02-23 Thread Vlad Blando

Problem is now solved, the cluster is now backfilling/recovering normally,
no more NEAR FULL OSD.

It turns out that I have RBD objects that should have been deleted long
time ago but it's still there. Openstack Glance did not removed it, I think
it's an issue with snapshots, an RBD file can't be deleted unless it's
snapshots are purged. So I compared all my glance images to the RBD
counterpart and identified which are not there and deleted them.

So from 81% utilization I am down to 61%.

---
[root@controller-node opt]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100553G 39118G 61435G   61.10
POOLS:
NAMEID USED   %USED OBJECTS
images  4  1764G  1.76  225978
volumes 5  18533G 18.43 4762609
[root@controller-node opt]#
---





On Sat, Feb 20, 2016 at 5:38 AM, Lionel Bouton <
lionel-subscript...@bouton.name> wrote:

> Le 19/02/2016 17:1
> 
> 7, Don Laursen a écrit :
>
> Thanks. To summarize
>
> Your data, images+volumes = 27.15% space used
>
> Raw used = 81.71% used
>
>
>
> This is a big difference that I can’t account for? Can anyone? So is your
> cluster actually full?
>
>
> I believe this is the pool size being accounted for and it is harmless: 3
> x 27.15 = 81.45 which is awfully close to 81.71.
> We have the same behavior on our Ceph cluster.
>
>
>
> I had the same problem with my small cluster. Raw used was about 85% and
> actual data, with replication, was about 30%. My OSDs were also BRTFS.
> BRTFS was causing its own problems. I fixed my problem by removing each OSD
> one at a time and re-adding as the default XFS filesystem. Doing so brought
> the percentages used to be about the same and it’s good now.
>
>
> That's odd : AFAIK we had the same behaviour with XFS before migrating to
> BTRFS.
>
> Best regards,
>
> Lionel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
ᐧ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] librados: how to get notified when a certain object is created

2016-02-23 Thread Brad Hubbard


- Original Message -
> From: "Sorin Manolache" 
> To: ceph-users@lists.ceph.com
> Sent: Sunday, 21 February, 2016 8:20:13 AM
> Subject: [ceph-users] librados: how to get notified when a certain object is  
> created
> 
> Hello,
> 
> I can set a watch on an object in librados. Does this object have to
> exist already at the moment I'm setting the watch on it? What happens if
> the object does not exist? Is my watcher valid? Will I get notified when
> someone else creates the missing object that I'm watching and sends a
> notification?
> 
> If the watch is not valid if the object has not yet been created then
> how can I get notified when the object is created? (I can imagine a
> work-around: there's an additional object, a kind of object registry
> object (the equivalent of a directory in a file system), that contains
> the list of created objects. I'm watching for modifications of the
> object registry object. Whenever a new object is created, the agent that
> creates the object also updates the object registry object.)

Could an object class be the right solution here?

https://github.com/ceph/ceph/blob/master/src/cls/hello/cls_hello.cc#L78

Cheers,
Brad

> 
> Thank you,
> Sorin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSDs are crashing during PG replication

2016-02-23 Thread Alexander Gubanov

Hi,

Every time 2 of 18 OSDs are crashing. I think it's happening when run PG
replication because crashing only 2 OSDs and every time they're are the
same.

0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In
function 'int ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' thread
7fd994825700 time 2016-02-24 04:51:45.870995
osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete)

 ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
 1: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
ceph::buffer::list::iterator&, OSDOp&,
std::tr1::shared_ptr&, bool)+0xffc) [0x7c1f7c]
 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector >&)+0x4171) [0x809f21]
 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62)
[0x814622]
 4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
 5: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3dd4) [0x81a3f4]
 6: (ReplicatedPG::do_request(std::tr1::shared_ptr,
ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
 7: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
 8: (OSD::OpWQ::_process(boost::intrusive_ptr,
ThreadPool::TPHandle&)+0x203) [0x61cba3]
 9: (ThreadPool::WorkQueueVal,
std::tr1::shared_ptr >, boost::intrusive_ptr
>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
 11: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
 12: (()+0x7dc5) [0x7fd9ad03edc5]
 13: (clone()+0x6d) [0x7fd9abd2828d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---
2016-02-24 04:51:45.97 7fd994825700 -1 *** Caught signal (Aborted) **
 in thread 7fd994825700

 ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
 1: /usr/bin/ceph-osd() [0x9a24f6]
 2: (()+0xf100) [0x7fd9ad046100]
 3: (gsignal()+0x37) [0x7fd9abc675f7]
 4: (abort()+0x148) [0x7fd9abc68ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5]
 6: (()+0x5e946) [0x7fd9ac569946]
 7: (()+0x5e973) [0x7fd9ac569973]
 8: (()+0x5eb93) [0x7fd9ac569b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1ef) [0xa8d9df]
 10: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*,
ceph::buffer::list::iterator&, OSDOp&,
std::tr1::shared_ptr&, bool)+0xffc) [0x7c1f7c]
 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector >&)+0x4171) [0x809f21]
 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62)
[0x814622]
 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
 14: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x3dd4)
[0x81a3f4]
 15: (ReplicatedPG::do_request(std::tr1::shared_ptr,
ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
 16: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
 17: (OSD::OpWQ::_process(boost::intrusive_ptr,
ThreadPool::TPHandle&)+0x203) [0x61cba3]
 18: (ThreadPool::WorkQueueVal,
std::tr1::shared_ptr >, boost::intrusive_ptr
>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
 20: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
 21: (()+0x7dc5) [0x7fd9ad03edc5]
 22: (clone()+0x6d) [0x7fd9abd2828d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
-5> 2016-02-24 04:51:45.904559 7fd995026700  5 -- op tracker -- , seq:
19230, time: 2016-02-24 04:51:45.904559, event: started, request:
osd_op(osd.13.12097:806246 rb.0.218d6.238e1f29.00010db3@snapdir
[list-snaps] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone
e13252) v4
-4> 2016-02-24 04:51:45.904598 7fd995026700  1 -- 172.16.0.1:6801/419703
--> 172.16.0.3:6844/12260 -- osd_op_reply(806246
rb.0.218d6.238e1f29.00010db3 [list-snaps] v0'0 uv27683057 ondisk = 0)
v6 -- ?+0 0x9f90800 con 0x1b7838c0
-3> 2016-02-24 04:51:45.904616 7fd995026700  5 -- op tracker -- , seq:
19230, time: 2016-02-24 04:51

Re: [ceph-users] Ceph and its failures

2016-02-23 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

You probably haven't written to any objects after fixing the problem.
Do some client I/O on the cluster and the PG will show fixed again. I
had this happen to me as well.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.5
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWzSv5CRDmVDuy+mK58QAABe4P/jJ4Vtp9qsV6T49/17FW
qgoZlxIfTLDXNnsTUUFju3c20hDHTET8uMCsaCrLb02ZujbGV0a1LcW/ffJe
hjWx1ExyyrN0bTdwBe+RRycKriHTFH19Fx3zVoRQvDaWoTAbjTFZkvQAxftN
vqKonYxsWyvITYLCFMtX0aPEljo+kQ8BNK4vJoPA2hw6cc0TKIKHSsbt9a0Q
6eCjuSPB76cGDRfbxnZbTXT79UgPD4m5ztNo3stXjvfzRMq0/6YLov8rBXTJ
y5bnlheBOHfwcS/9P1Vdi+LDDy+iaZb5/gEwXPPzV2uGr/z8RTgGMk0dKyk3
fzZHWU7FhUIl3OVDF3IqQe2tZtWTs59fithHRme7T7+tmQaG0VOd1noMYlNz
n3bCQOJutfcyWvU4naQSkgAPfvTH0GwNp16ETAZlB6pADKtH3oXMOPW3CH5H
HyY5+H9w7ELbYiuJlGwMRyko/sNIiVEoj2dZB/ta+61G8+nlYR2GsjLceXOM
HP9Wi3MrVJtXDLFrnQRglB2dfFWvBlrlBTj3uG7Ebn5DO6glxPEAvzrOgsJ2
O8D5+AMvooc41T74aUcWQK8NHNrrN+eL18yhRfjCgyadA2VYvWeu6K7sIUFo
NKFE66ahsxrNKZUrLjeCo69iP4Zf5+AgY7rCau81vzQNtmFUPjzUKyOzgpsb
Y2fQ
=TGcG
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Feb 23, 2016 at 2:08 PM, Nmz  wrote:
>>> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>>>
>>> Ceph contains
>>>  MON: 3
>>>  OSD: 3
>>>
>> For completeness sake, the OSDs are on 3 different hosts, right?
>
> It is single machine. I`m doing tests only.
>
>>> File system: ZFS
>> That is the odd one out, very few people I'm aware of use it, support for
>> it is marginal at best.
>> And some of its features may of course obscure things.
>
> I`m using ZFS on linux for a log time and I`m happy with it.
>
>
>> Exact specification please, as in how is ZFS configured (single disk,
>> raid-z, etc)?
>
> 2 disks in mirror mode.
>
>>> Kernel: 4.2.6
>>>
>> While probably not related, I vaguely remember 4.3 being recommended for
>> use with Ceph.
>
> At this time I can run only this kernel. But IF I decide to use Ceph (only if 
> Ceph satisfy requirements) I can use any other kernel.
>
>>> 3. Does Ceph have auto heal option?
>> No.
>> And neither is the repair function a good idea w/o checking the data on
>> disk first.
>> This is my biggest pet peeve with Ceph and you will find it mentioned
>> frequently in this ML, just a few days ago this thread for example:
>> "pg repair behavior? (Was: Re: getting rid of misplaced objects)"
>
> It is very strange to recovery data manually without know which data is good.
> If I have 3 copies of data and 2 of them are corrupted then I cat recovery 
> the bad one.
>
>
> --
>
> Did some new test. Now new 3 OSD are in different systems. FS is ext3
>
> Same start as before.
>
> # grep "a" * -R
> Binary file 
> osd/nmz-5/current/17.17_head/rbd\udata.1bef77ac761fb.0001__head_FB98F317__11
>  matches
> Binary file osd/nmz-5-journal/journal matches
>
> # ceph pg dump | grep 17.17
> dumped all in format plain
> 17.17   1   0   0   0   0   40961   1   
> active+clean2016-02-23 16:14:32.234638  291'1   309:44  [5,4,3] 5 
>   [5,4,3] 5   0'0 2016-02-22 20:30:04.255301  0'0 2016-02-22 
> 20:30:04.255301
>
> # md5sum rbd\\udata.1bef77ac761fb.0001__head_FB98F317__11
> \c2642965410d118c7fe40589a34d2463  
> rbd\\udata.1bef77ac761fb.0001__head_FB98F317__11
>
> # sed -i -r 's/aa/ab/g' 
> rbd\\udata.1bef77ac761fb.0001__head_FB98F317__11
>
>
> # ceph pg deep-scrub 17.17
>
> 7fbd99e6c700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub starts
> 7fbd97667700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub ok
>
> -- restartind OSD.5
>
> # ceph pg deep-scrub 17.17
>
> 7f00f40b8700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub starts
> 7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 shard 5: soid 
> 17/fb98f317/rbd_data.1bef77ac761fb.0001/head data_digest 
> 0x389d90f6 != known data_digest 0x4f18a4a5 from auth shard 3, missing attr _, 
> missing attr snapset
> 7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 deep-scrub 0 missing, 
> 1 inconsistent objects
> 7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 deep-scrub 1 errors
>
>
> Ceph 9.2.0 bug ?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

ceph pg dump

Since all objects map to a PG, as long as you can verify that no PG is
on the same host/chassis/rack, you are good.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.5
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWzS0NCRDmVDuy+mK58QAANfQP/19WHCUa2wPK6cHwx6zC
msfy+zipJ86qvqTgAh5azy0VRIk5lo1GknwMJhulox5vk5M+GQo0ermR/yfw
MbXKXy1f81NeZgQSqDX+GD3V19c/mb1WYuA0SLatPKkvv6L5BxPzHoGm6HYE
1hr3VSMYixCE2JZubQxj8EA+RnrJXYPue+e9aRXGbFymXIGHNdW5A3wU/vlp
IJ18E3vTIrAdmpyKlLFYhI6w2sMPUSwGllqfBpuo+OxVE+9Wa+AptZIClNXB
CI2Ozs02V9aRwUiCf6qPIBUAIPUE6/uDqzcS3mId8KUs4IxGi0pCr/t2irr5
jdc3u4WLtmZISo7RC/yyftvFFWvUkH0+2tr3lLQXHaDc+RaJPdlj5v5tylJp
j5HTywmzz/vIPKFnn9OmVimMHfFJyWinShixVWI4ORKnPFD0gT0Qlg0yC2Hx
PmtFE/OxUvYYM65WKONhAUTrjOlLAjbibFHDwhuXfQ/1Pxuh28YWkAyX/wdE
cFZxoq6E6DePuKNO3xw1EqBUVsncW3+PltN7b+CWVOawEp+me42Ovetq7OqU
B8aQhqQB0/T8bRYeIzINkkB60k6gSvrF5TO2Kq+x7UiYUQ82KyHE+zlTryXW
0BEj2bK9s4NtAItkx3F7bcmnusOOlb1AMMJFssMQV/LmjDOR9xJUYiuqXxrb
6AB3
=hv6I
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Feb 23, 2016 at 3:33 PM, Vickey Singh
 wrote:
> Adding community for further help on this.
>
> On Tue, Feb 23, 2016 at 10:57 PM, Vickey Singh 
> wrote:
>>
>>
>>
>> On Tue, Feb 23, 2016 at 9:53 PM, Gregory Farnum 
>> wrote:
>>>
>>>
>>>
>>> On Tuesday, February 23, 2016, Vickey Singh 
>>> wrote:

 Thanks Greg,

 Do you mean ceph osd map command is not displaying accurate information
 ?

 I guess, either of these things are happening with my cluster
 - ceph osd map is not printing true information
 - Object to PG mapping is not correct ( one object is mapped to multiple
 PG's )

 This is happening for several objects , but the cluster is Healthy.
>>>
>>>
>>> No, you're looking for the map command to do something it was not
>>> designed for. If you want to see if an object exists, you will need to use a
>>> RADOS client to fetch the object and see if it's there. "map" is a mapping
>>> command: given an object name, which PG/OSD does CRUSH map that name to?
>>
>>
>> well your 6th sense is amazing :)
>>
>> This is exactly i want to achieve , i wan to see my PG/OSD mapping for
>> objects. ( basically i have changed my crush hierarchy , now i want to
>> verify that no 2 objects should go to a single host / chassis / rack ) so to
>> verify them i was using ceph osd map command.
>>
>> Is there a smarter way to achieve this ?
>>
>>
>>
>>
>>>
>>>


 Need expert suggestion.


 On Tue, Feb 23, 2016 at 7:20 PM, Gregory Farnum 
 wrote:
>
> This is not a bug. The map command just says which PG/OSD an object
> maps to; it does not go out and query the osd to see if there actually is
> such an object.
> -Greg
>
>
> On Tuesday, February 23, 2016, Vickey Singh
>  wrote:
>>
>> Hello Guys
>>
>> I am getting wired output from osd map. The object does not exists on
>> pool but osd map still shows its PG and OSD on which its stored.
>>
>> So i have rbd device coming from pool 'gold' , this image has an
>> object 'rb.0.10f61.238e1f29.2ac5'
>>
>> The below commands verifies this
>>
>> [root@ceph-node1 ~]# rados -p gold ls | grep -i
>> rb.0.10f61.238e1f29.2ac5
>> rb.0.10f61.238e1f29.2ac5
>> [root@ceph-node1 ~]#
>>
>> This object lives on pool gold and OSD 38,0,20 , which is correct
>>
>> [root@ceph-node1 ~]# ceph osd map gold
>> rb.0.10f61.238e1f29.2ac5
>> osdmap e1357 pool 'gold' (1) object 'rb.0.10f61.238e1f29.2ac5'
>> -> pg 1.11692600 (1.0) -> up ([38,0,20], p38) acting ([38,0,20], p38)
>> [root@ceph-node1 ~]#
>>
>>
>> Since i don't have object 'rb.0.10f61.238e1f29.2ac5' in data
>> and rbd pools , rados ls will not list it. Which is expected.
>>
>> [root@ceph-node1 ~]# rados -p data ls | grep -i
>> rb.0.10f61.238e1f29.2ac5
>> [root@ceph-node1 ~]# rados -p rbd ls | grep -i
>> rb.0.10f61.238e1f29.2ac5
>>
>>
>> But , how come the object is showing in osd map of pool data and rbd.
>>
>> [root@ceph-node1 ~]# ceph osd map data
>> rb.0.10f61.238e1f29.2ac5
>> osdmap e1357 pool 'data' (2) object 'rb.0.10f61.238e1f29.2ac5'
>> -> pg 2.11692600 (2.0) -> up ([3,51,29], p3) acting ([3,51,29], p3)
>> [root@ceph-node1 ~]#
>>
>> [root@ceph-node1 ~]# ceph osd map rbd rb.0.10f61.238e1f29.2ac5
>> osdmap e1357 pool 'rbd' (0) object 'rb.0.10f61.238e1f29.2ac5'
>> -> pg 0.11692600 (0.0) -> up ([41,20,3], p41) acting ([41,20,3], p41)
>> [root@ceph-node1 ~]#
>>
>>
>> In ceph, object is unique and belongs to only one pool. So why does it
>> shows up in all pool's osd map.
>>
>> Is this some kind of BUG in Ceph
>>
>>

[ceph-users] Ceph stable release team: call for participation

2016-02-23 Thread Loic Dachary

Hi Ceph,

TL;DR: If you have one day a week to work on the next Ceph stable releases [1] 
your help would be most welcome.

The Ceph "Long Term Stable" (LTS) releases - currently hammer[2] - are used by 
individuals, non-profits, government agencies and companies for their 
production Ceph clusters. They are also used when Ceph is integrated into 
larger products, such as hardware appliances. Ceph packages for a range of 
supported distribution are available at http://ceph.com/. Before the packages 
for a new stable release are published, they are carefully tested for potential 
regressions or upgrade problems. The Ceph project makes every effort to ensure 
the packages published at http://ceph.com/ can be used and upgraded in 
production.

The Stable release team[3] plays an essential role in the making of each Ceph 
stable release. In addition to maintaining an inventory of bugfixes that are in 
various stages of backporting[4], in most cases we do the actual backporting 
ourselves[5]. We also run integration tests involving hundreds of machines[6] 
and analyze the test results when they fail[7]. The developers of the bugfixes 
only hear from us when we're stuck or to make the final decision whether to 
merge a backport into the stable branch. Our process is well documented[8] and 
participating is a relaxing experience (IMHO ;-). Every month or so we have the 
satisfaction of seeing a new stable release published.

There are no pre-requistes to participate. Over time, it is an opportunity to 
learn how the code base is organized. When trying to figure out which commit 
does not cherry-pick cleanly, you will learn some of the logic of the Ceph 
internals. Last but not least, running the integration tests[6] and analyzing 
failures is a great way to know precisely what Ceph is capable of.

Nathan Cutler (SUSE) drives the next Hammer release[9] and Abhishek Varshney 
(Flipkart) drives the next Infernalis release[10]. Abhishek Lekshmanan (SUSE) 
helps on all releases and M Ranga Swami Reddy (Reliance Jio Infocomm Ltd.) 
learns the workflow by helping with Hammer. Loic Dachary (Red Hat), one of the 
Ceph core developers, oversees the process and provides help and advice when 
necessary. After these two releases are published (which should happen in the 
next few weeks), the roles will change and we would like to invite you to 
participate. If you're employed by a company using Ceph or doing business with 
it, maybe your manager could agree to give back to the Ceph community in this 
way. You can join at any time and you will be mentored while the ongoing 
releases complete. When the time comes (and if you feel ready), you will be 
offered a seat to drive the next release.

Cheers

[1] Ceph Releases timeline http://ceph.com/docs/master/releases/
[2] Hammer v0.94.6 http://ceph.com/docs/master/release-notes/#v0-94-6-hammer
[3] Stable release team http://tracker.ceph.com/projects/ceph-releases
[4] Hammer backports http://tracker.ceph.com/projects/ceph/issues?query_id=78
[5] Backporting commits 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_backport_commits
[6] Integration tests 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_run_integration_and_upgrade_tests
[7] Forensic analysis of integration tests 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_forensic_analysis_of_integration_and_upgrade_tests
[8] Ceph Stable releases home page 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO
[9] Hammer v0.94.7 http://tracker.ceph.com/issues/14692
[10] Infernalis v9.2.1 http://tracker.ceph.com/issues/13750

-- 
Loïc Dachary, Artisan Logiciel Libre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

[ceph-users] Why my cluster performance is so bad?

Re: [ceph-users] Why my cluster performance is so bad?

Re: [ceph-users] Why my cluster performance is so bad?

[ceph-users] Incorrect output from ceph osd map command

[ceph-users] v0.94.6 Hammer released

Re: [ceph-users] Incorrect output from ceph osd map command

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

[ceph-users] Old MDS resurrected after update

Re: [ceph-users] librados: how to get notified when a certain object is created

Re: [ceph-users] Incorrect output from ceph osd map command

Re: [ceph-users] Incorrect output from ceph osd map command

Re: [ceph-users] Rack weight imbalance

[ceph-users] xfs corruption

Re: [ceph-users] Ceph and its failures

Re: [ceph-users] osd not removed from crush map after ceph osd crush remove

Re: [ceph-users] Incorrect output from ceph osd map command

Re: [ceph-users] How to properly deal with NEAR FULL OSD

Re: [ceph-users] librados: how to get notified when a certain object is created

[ceph-users] OSDs are crashing during PG replication

Re: [ceph-users] Ceph and its failures

Re: [ceph-users] Incorrect output from ceph osd map command

[ceph-users] Ceph stable release team: call for participation

23 matches

Site Navigation

Mail list logo

Footer information