Re: [ceph-users] KVM problems when rebalance occurs

2016-01-08 Thread nick
Hi,
benchmarking is done via fio and different blocksizes. I compared with 
benchmarks I did before the ceph.conf change and encountered very similar 
numbers. 

Thanks for the hint with mysql benchmarking. I will try it out.

Cheers
Nick

On Friday, January 08, 2016 06:59:13 AM Josef Johansson wrote:
> Hi,
> 
> How did you benchmark?
> 
> I would recommend to have a lot of mysql with a lot of innodb tables that
> are utilised heavily. During a recover you should see the latency rise at
> least. Maybe using one of the tools here
> https://dev.mysql.com/downloads/benchmarks.html
> 
> Regards,
> Josef
> 
> On 7 Jan 2016 16:36, "Robert LeBlanc"  wrote:
> > With these min,max settings, we didn't have any problem going to more
> > backfills.
> > 
> > Robert LeBlanc
> > 
> > Sent from a mobile device please excuse any typos.
> > 
> > On Jan 7, 2016 8:30 AM, "nick"  wrote:
> >> Heya,
> >> thank you for your answers. We will try to set 16/32 as values for
> >> osd_backfill_scan_[min|max]. I also set the debug logging config. Here is
> >> an
> >> excerpt of our new ceph.conf:
> >> 
> >> """
> >> [osd]
> >> osd max backfills = 1
> >> osd backfill scan max = 32
> >> osd backfill scan min = 16
> >> osd recovery max active = 1
> >> osd recovery op priority = 1
> >> osd op threads = 8
> >> 
> >> [global]
> >> debug optracker = 0/0
> >> debug asok = 0/0
> >> debug hadoop = 0/0
> >> debug mds migrator = 0/0
> >> debug objclass = 0/0
> >> debug paxos = 0/0
> >> debug context = 0/0
> >> debug objecter = 0/0
> >> debug mds balancer = 0/0
> >> debug finisher = 0/0
> >> debug auth = 0/0
> >> debug buffer = 0/0
> >> debug lockdep = 0/0
> >> debug mds log = 0/0
> >> debug heartbeatmap = 0/0
> >> debug journaler = 0/0
> >> debug mon = 0/0
> >> debug client = 0/0
> >> debug mds = 0/0
> >> debug throttle = 0/0
> >> debug journal = 0/0
> >> debug crush = 0/0
> >> debug objectcacher = 0/0
> >> debug filer = 0/0
> >> debug perfcounter = 0/0
> >> debug filestore = 0/0
> >> debug rgw = 0/0
> >> debug monc = 0/0
> >> debug rbd = 0/0
> >> debug tp = 0/0
> >> debug osd = 0/0
> >> debug ms = 0/0
> >> debug mds locker = 0/0
> >> debug timer = 0/0
> >> debug mds log expire = 0/0
> >> debug rados = 0/0
> >> debug striper = 0/0
> >> debug rbd replay = 0/0
> >> debug none = 0/0
> >> debug keyvaluestore = 0/0
> >> debug compressor = 0/0
> >> debug crypto = 0/0
> >> debug xio = 0/0
> >> debug civetweb = 0/0
> >> debug newstore = 0/0
> >> """
> >> 
> >> I already made a benchmark on our staging setup with the new config and
> >> fio, but
> >> did not really get different results than before.
> >> 
> >> For us it is hardly possible to reproduce the 'stalling' problems on the
> >> staging cluster so I will have to wait and test this in production.
> >> 
> >> Does anyone know if 'osd max backfills' > 1 could have an impact as well?
> >> The
> >> default seems to be 10...
> >> 
> >> Cheers
> >> Nick
> >> 
> >> On Wednesday, January 06, 2016 09:17:43 PM Josef Johansson wrote:
> >> > Hi,
> >> > 
> >> > Also make sure that you optimize the debug log config. There's a lot on
> >> 
> >> the
> >> 
> >> > ML on how to set them all to low values (0/0).
> >> > 
> >> > Not sure how it's in infernalis but it did a lot in previous versions.
> >> > 
> >> > Regards,
> >> > Josef
> >> > 
> >> > On 6 Jan 2016 18:16, "Robert LeBlanc"  wrote:
> >> > > -BEGIN PGP SIGNED MESSAGE-
> >> > > Hash: SHA256
> >> > > 
> >> > > There has been a lot of "discussion" about osd_backfill_scan[min,max]
> >> > > lately. My experience with hammer has been opposite that of what
> >> > > people have said before. Increasing those values for us has reduced
> >> > > the load of recovery and has prevented a lot of the disruption seen
> >> > > in
> >> > > our cluster caused by backfilling. It does increase the amount of
> >> > > time
> >> > > to do the recovery (a new node added to the cluster took about 3-4
> >> > > hours before, now takes about 24 hours).
> >> > > 
> >> > > We are currently using these values and seem to work well for us.
> >> > > osd_max_backfills = 1
> >> > > osd_backfill_scan_min = 16
> >> > > osd_recovery_max_active = 1
> >> > > osd_backfill_scan_max = 32
> >> > > 
> >> > > I would be interested in your results if you try these values.
> >> > > -BEGIN PGP SIGNATURE-
> >> > > Version: Mailvelope v1.3.2
> >> > > Comment: https://www.mailvelope.com
> >> > > 
> >> > > wsFcBAEBCAAQBQJWjUu/CRDmVDuy+mK58QAArdMQAI+0Er/sdN7TF7knGey2
> >> > > 5wJ6Ie81KJlrt/X9fIMpFdwkU2g5ET+sdU9R2hK4XcBpkonfGvwS8Ctha5Aq
> >> > > XOJPrN4bMMeDK9Z4angK86ioLJevTH7tzp3FZL0U4Kbt1s9ZpwF6t+wlvkKl
> >> > > mt6Tkj4VKr0917TuXqk58AYiZTYcEjGAb0QUe/gC24yFwZYrPO0vUVb4gmTQ
> >> > > klNKAdTinGSn4Ynj+lBsEstWGVlTJiL3FA6xRBTz1BSjb4vtb2SoIFwHlAp+
> >> > > GO+bKSh19YIasXCZfRqC/J2XcNauOIVfb4l4viV23JN2fYavEnLCnJSglYjF
> >> > > Rjxr0wK+6NhRl7naJ1yGNtdMkw+h+nu/xsbYhNqT0EVq1d0nhgzh6ZjAhW1w
> >> > > oRiHYA4KNn2uWiUgigpISFi4hJSP4CEPToO8jbhXhARs0H6v33oWrR8RYKxO
> >> > > dFz+Lxx969rpDkk+1nRks9hTeIF+oFnW7eezSiR6TIL

[ceph-users] can rbd block_name_prefix be changed?

2016-01-08 Thread min fang
Hi, can rbd block_name_prefix be changed?  Is it constant for a rbd image?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Swift use Rados backend

2016-01-08 Thread Sam Huracan
Hi,

How could I use Ceph as Backend for Swift?
I follow these git:
https://github.com/stackforge/swift-ceph-backend
https://github.com/enovance/swiftceph-ansible

I try to install manually, but I am stucking in configuring entry for ring.
What device I use in 'swift-ring-builder account.builder add
z1-10.10.10.53:6002/*sdb1 *100' if I use Rados?

Thanks and regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs (ceph-fuse) and file-layout: "operation not supported" in a client Ubuntu Trusty

2016-01-08 Thread Francois Lafont
Hi @all,

I'm using ceph Infernalis (9.2.0) in the client and cluster side.
I have a Ubuntu Trusty client where cephfs is mounted via ceph-fuse
and I would like to put a sub-directory of cephfs in a specific pool
(a ssd pool).

In the cluster, I have:

~# ceph auth get client.cephfs
exported keyring for client.cephfs
[client.cephfs]
key = XX==
caps mds = "allow"
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx 
pool=cephfsdata, allow rwx pool=poolssd"

~# ceph fs ls
name: cephfs, metadata pool: cephfsmetadata, data pools: [cephfsdata poolssd ]

Now, in the Ubuntu Trusty client, I have installed the "attr" package
and I try this:

~# mkdir /mnt/cephfs/ssd

~# setfattr -n ceph.dir.layout.pool -v poolssd /mnt/cephfs/ssd/
setfattr: /mnt/cephfs/ssd/: Operation not supported

~# getfattr -n ceph.dir.layout /mnt/cephfs/
/mnt/cephfs/: ceph.dir.layout: Operation not supported

Here is my fstab line which mount the cephfs:

id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/data1 
/mnt/cephfs fuse.ceph noatime,defaults,_netdev 0 0

Where is my problem?
Thanks in advance for your help. ;)

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Intel P3700 PCI-e as journal drives?

2016-01-08 Thread Burkhard Linke

Hi,

I want to start another round of SSD discussion since we are about to 
buy some new servers for our ceph cluster. We plan to use hosts with 12x 
4TB drives and two SSD journals drives. I'm fancying Intel P3700 PCI-e 
drives, but Sebastien Han's blog does not contain performance data for 
these drives yet.


Is anyone able to share some benchmark results for Intel P3700 PCI-e drives?

Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd tree output

2016-01-08 Thread Wade Holler
That is not set as far as I can tell.  Actually it is strange that I don't
see that setting at all.

[root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep crush

[root@cpn1 ~]# grep update /etc/ceph/ceph.conf

[root@cpn1 ~]#

On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen  wrote:

>
>
> Hi,
>
> Do you have by any chance disabled automatic crushmap updates in your ceph
> config?
>
> osd crush update on start = false
>
> If this is the case, and you move disks around hosts, they won't update
> their position/host in the crushmap, even if the crushmap does not reflect
> reality.
>
> Regards,
>
> Mart
>
>
>
>
>
> On 01/08/2016 02:16 AM, Wade Holler wrote:
>
> Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs per
> node,  but I will only include a sample:
>
> ceph osd tree | head -35
>
> ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>
>  -1 130.98450 root default
>
>  -2   5.82153 host cpn1
>
>   4   0.72769 osd.4  up  1.0  1.0
>
>  14   0.72769 osd.14 up  1.0  1.0
>
>   3   0.72769 osd.3  up  1.0  1.0
>
>  24   0.72769 osd.24 up  1.0  1.0
>
>   5   0.72769 osd.5  up  1.0  1.0
>
>   2   0.72769 osd.2  up  1.0  1.0
>
>  17   0.72769 osd.17 up  1.0  1.0
>
>  69   0.72769 osd.69 up  1.0  1.0
>
>  -3   6.54922 host cpn3
>
>   7   0.72769 osd.7  up  1.0  1.0
>
>   8   0.72769 osd.8  up  1.0  1.0
>
>   9   0.72769 osd.9  up  1.0  1.0
>
>   0   0.72769 osd.0  up  1.0  1.0
>
>  28   0.72769 osd.28 up  1.0  1.0
>
>  10   0.72769 osd.10 up  1.0  1.0
>
>   1   0.72769 osd.1  up  1.0  1.0
>
>   6   0.72769 osd.6  up  1.0  1.0
>
>  29   0.72769 osd.29 up  1.0  1.0
>
>  -4   2.91077 host cpn4
>
>
> Compared with the actual processes that are running:
>
>
> [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd
>
> ceph   92638   1 26 16:19 ?01:00:55 /usr/bin/ceph-osd -f
> --cluster ceph --id 6 --setuser ceph --setgroup ceph
>
> ceph   92667   1 20 16:19 ?00:48:04 /usr/bin/ceph-osd -f
> --cluster ceph --id 0 --setuser ceph --setgroup ceph
>
> ceph   92673   1 18 16:19 ?00:42:48 /usr/bin/ceph-osd -f
> --cluster ceph --id 8 --setuser ceph --setgroup ceph
>
> ceph   92681   1 19 16:19 ?00:45:52 /usr/bin/ceph-osd -f
> --cluster ceph --id 7 --setuser ceph --setgroup ceph
>
> ceph   92701   1 15 16:19 ?00:36:05 /usr/bin/ceph-osd -f
> --cluster ceph --id 12 --setuser ceph --setgroup ceph
>
> ceph   92748   1 14 16:19 ?00:34:07 /usr/bin/ceph-osd -f
> --cluster ceph --id 10 --setuser ceph --setgroup ceph
>
> ceph   92756   1 16 16:19 ?00:38:40 /usr/bin/ceph-osd -f
> --cluster ceph --id 9 --setuser ceph --setgroup ceph
>
> ceph   92758   1 17 16:19 ?00:39:28 /usr/bin/ceph-osd -f
> --cluster ceph --id 13 --setuser ceph --setgroup ceph
>
> ceph   92777   1 19 16:19 ?00:46:17 /usr/bin/ceph-osd -f
> --cluster ceph --id 1 --setuser ceph --setgroup ceph
>
> ceph   92988   1 18 16:19 ?00:42:47 /usr/bin/ceph-osd -f
> --cluster ceph --id 5 --setuser ceph --setgroup ceph
>
> ceph   93058   1 18 16:19 ?00:43:18 /usr/bin/ceph-osd -f
> --cluster ceph --id 11 --setuser ceph --setgroup ceph
>
> ceph   93078   1 17 16:19 ?00:41:38 /usr/bin/ceph-osd -f
> --cluster ceph --id 14 --setuser ceph --setgroup ceph
>
> ceph   93127   1 15 16:19 ?00:36:29 /usr/bin/ceph-osd -f
> --cluster ceph --id 4 --setuser ceph --setgroup ceph
>
> ceph   93130   1 17 16:19 ?00:40:44 /usr/bin/ceph-osd -f
> --cluster ceph --id 2 --setuser ceph --setgroup ceph
>
> ceph   93173   1 21 16:19 ?00:49:37 /usr/bin/ceph-osd -f
> --cluster ceph --id 3 --setuser ceph --setgroup ceph
>
> [root@cpx1 ~]# ssh cpn3 ps -ef | grep ceph\-osd
>
> ceph   82454   1 18 16:19 ?00:43:58 /usr/bin/ceph-osd -f
> --cluster ceph --id 25 --setuser ceph --setgroup ceph
>
> ceph   82464   1 24 16:19 ?00:55:40 /usr/bin/ceph-osd -f
> --cluster ceph --id 21 --setuser ceph --setgroup ceph
>
> ceph   82473   1 21 16:19 ?00:50:14 /usr/bin/ceph-osd -f
> --cluster ceph --id 17 --setuser ceph --setgroup ceph
>
> ceph   82612   1 19 16:19 ?00:45:25 /usr/bin/ceph-osd -f
> --cluster ceph --id 22 --setuser ceph --setgroup ceph
>
> ceph   82629   1 20 16:19 ?00:48:38 /usr/bin/cep

Re: [ceph-users] Intel P3700 PCI-e as journal drives?

2016-01-08 Thread Paweł Sadowski
Hi,

Quick results for 1/5/10 jobs:


# fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
fio-2.1.3
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0KB/373.2MB/0KB /s] [0/95.6K/0 iops]
[eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=99634: Fri Jan  8
13:51:53 2016
  write: io=21116MB, bw=360373KB/s, iops=90093, runt= 6msec
clat (usec): min=7, max=14738, avg=10.79, stdev=29.04
 lat (usec): min=7, max=14738, avg=10.84, stdev=29.04
clat percentiles (usec):
 |  1.00th=[8],  5.00th=[8], 10.00th=[8], 20.00th=[8],
 | 30.00th=[8], 40.00th=[8], 50.00th=[9], 60.00th=[9],
 | 70.00th=[9], 80.00th=[   12], 90.00th=[   18], 95.00th=[   22],
 | 99.00th=[   34], 99.50th=[   37], 99.90th=[   50], 99.95th=[   54],
 | 99.99th=[   72]
bw (KB  /s): min=192456, max=394392, per=99.97%, avg=360254.66,
stdev=46490.05
lat (usec) : 10=73.77%, 20=18.79%, 50=7.33%, 100=0.10%, 250=0.01%
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu  : usr=15.92%, sys=13.08%, ctx=5405192, majf=0, minf=27
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=0/w=5405592/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=21116MB, aggrb=360372KB/s, minb=360372KB/s, maxb=360372KB/s,
mint=6msec, maxt=6msec

Disk stats (read/write):
  nvme0n1: ios=0/5397207, merge=0/0, ticks=0/42596, in_queue=42596,
util=71.01%



# fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
...
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
fio-2.1.3
Starting 5 processes
Jobs: 5 (f=5): [W] [100.0% done] [0KB/1023MB/0KB /s] [0/262K/0 iops]
[eta 00m:00s]
journal-test: (groupid=0, jobs=5): err= 0: pid=99932: Fri Jan  8
13:57:07 2016
  write: io=57723MB, bw=985120KB/s, iops=246279, runt= 60001msec
clat (usec): min=7, max=23102, avg=20.00, stdev=78.26
 lat (usec): min=7, max=23102, avg=20.05, stdev=78.26
clat percentiles (usec):
 |  1.00th=[8],  5.00th=[9], 10.00th=[   10], 20.00th=[   12],
 | 30.00th=[   14], 40.00th=[   15], 50.00th=[   16], 60.00th=[   18],
 | 70.00th=[   21], 80.00th=[   25], 90.00th=[   29], 95.00th=[   36],
 | 99.00th=[   62], 99.50th=[   77], 99.90th=[  193], 99.95th=[  612],
 | 99.99th=[ 1816]
bw (KB  /s): min=139512, max=225144, per=19.99%, avg=196941.33,
stdev=20911.73
lat (usec) : 10=6.84%, 20=59.99%, 50=31.33%, 100=1.61%, 250=0.14%
lat (usec) : 500=0.03%, 750=0.02%, 1000=0.01%
lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu  : usr=8.79%, sys=7.32%, ctx=14776785, majf=0, minf=138
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=0/w=14777043/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=57723MB, aggrb=985119KB/s, minb=985119KB/s, maxb=985119KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  nvme0n1: ios=0/14754265, merge=0/0, ticks=0/253092, in_queue=254880,
util=100.00%




# fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
...
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
fio-2.1.3
Starting 10 processes
Jobs: 10 (f=10): [WW] [100.0% done] [0KB/1026MB/0KB /s]
[0/263K/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=10): err= 0: pid=14: Fri Jan  8
13:58:24 2016
  write: io=65679MB, bw=1094.7MB/s, iops=280224, runt= 60001msec
clat (usec): min=7, max=23679, avg=35.33, stdev=118.33
 lat (usec): min=7, max=23679, avg=35.39, stdev=118.34
clat percentiles (usec):
 |  1.00th=[8],  5.00th=[9], 10.00th=[   10], 20.00th=[   12],
 | 30.00th=[   14], 40.00th=[   17], 50.00th=[   22], 60.00th=[   27],
 | 70.00th=[   33], 80.00th=[   45], 90.00th=[   68], 95.00th=[   90],
 | 99.00th=[  167], 99.50th=[  231], 99.90th=[ 1064], 99.95th=[ 1528],
 | 99.99th=[ 2416]
bw (KB  /s): min=66600, max=141064, per=10.01%, avg=112165.00,
stdev=16560.67
lat (usec) : 10=6.54%, 20=38.42%, 50=37.34%, 100=1

[ceph-users] pg is stuck stale (osd.21 still removed)

2016-01-08 Thread Daniel Schwager
Hi,

we had a HW-problem with OSD.21 today. The OSD daemon was down and "smartctl" 
told me about some hardware errors.

I decided to remove the HDD:

  ceph osd out 21
  ceph osd crush remove osd.21
  ceph auth del osd.21
  ceph osd rm osd.21

But afterwards I saw that I have some stucked pg's for osd.21: 

root@ceph-admin:~# ceph -w
cluster c7b12656-15a6-41b0-963f-4f47c62497dc
 health HEALTH_WARN
  50 pgs stale
50 pgs stuck stale
 monmap e4: 3 mons at 
{ceph-mon1=192.168.135.31:6789/0,ceph-mon2=192.168.135.32:6789/0,ceph-mon3=192.168.135.33:6789/0}
  election epoch 404, quorum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3
 mdsmap e136: 1/1/1 up {0=ceph-mon1=up:active}
 osdmap e18259: 23 osds: 23 up, 23 in
  pgmap v47879105: 6656 pgs, 10 pools, 23481 GB data, 6072 kobjects
  54974 GB used, 30596 GB / 85571 GB avail
6605 active+clean
50 stale+active+clean
   1 active+clean+scrubbing+deep

root@ceph-admin:~# ceph health
HEALTH_WARN 50 pgs stale; 50 pgs stuck stale

root@ceph-admin:~# ceph health detail
HEALTH_WARN 50 pgs stale; 50 pgs stuck stale; noout flag(s) set
pg 34.225 is stuck stale for 98780.399254, current state 
stale+active+clean, last acting [21]
pg 34.186 is stuck stale for 98780.399195, current state 
stale+active+clean, last acting [21]
...

root@ceph-admin:~# ceph pg 34.225   query
Error ENOENT: i don't have pgid 34.225

root@ceph-admin:~# ceph pg 34.225  list_missing
Error ENOENT: i don't have pgid 34.225

root@ceph-admin:~# ceph osd lost 21  --yes-i-really-mean-it
osd.21 is not down or doesn't exist

# checking the crushmap
  ceph osd getcrushmap -o crush.map
  crushtool -d crush.map  -o crush.txt
root@ceph-admin:~# grep 21 crush.txt
-> nothing here


Of course, I cannot start OSD.21, because it's not available anymore - I 
removed it.

Is there a way to remap the stucked pg's to other OSD's than osd.21? How can I 
help my cluster (ceph 0.94.2)?

best regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs (ceph-fuse) and file-layout: "operation not supported" in a client Ubuntu Trusty

2016-01-08 Thread Francois Lafont
Hi,

Some news...

On 08/01/2016 12:42, Francois Lafont wrote:

> ~# mkdir /mnt/cephfs/ssd
> 
> ~# setfattr -n ceph.dir.layout.pool -v poolssd /mnt/cephfs/ssd/
> setfattr: /mnt/cephfs/ssd/: Operation not supported
> 
> ~# getfattr -n ceph.dir.layout /mnt/cephfs/
> /mnt/cephfs/: ceph.dir.layout: Operation not supported
> 
> Here is my fstab line which mount the cephfs:
> 
> id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/data1
>  /mnt/cephfs fuse.ceph noatime,defaults,_netdev 0 0

In fact, I have retried the same thing without the "noatime" mount
option and after that it worked. Then I have retried _with_ the "noatime"
to be sure and... it worked too. Now, it just works with or witout the
option.

So I have 2 possible explanations:

1. The fact to remove noatime and mount just once has unblocked
something...

2. or I have another explanation terrible for me. Maybe during
my first attempt, the cephfs was just not mounted in fact. Indeed,
now I have a doubt on this point because few minutes after the attempt
I have seen that the cephfs was not mounted (and I don't know why).

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can rbd block_name_prefix be changed?

2016-01-08 Thread Jason Dillaman
It's constant for an RBD image and is tied to the image's internal unique ID.

-- 

Jason Dillaman 


- Original Message - 

> From: "min fang" 
> To: "ceph-users" 
> Sent: Friday, January 8, 2016 4:50:08 AM
> Subject: [ceph-users] can rbd block_name_prefix be changed?

> Hi, can rbd block_name_prefix be changed? Is it constant for a rbd image?

> thanks.

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-08 Thread hnuzhoulin

Hi,guyes
Recentlly,I am testing  cache-tier using writeback mode.but I found a  
strange things.

the performance  using rados bench degrade.Is it correct?
If so,how to explain.following some info about my test:

storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the other  
one used as OSD),four sata as OSD.


before using cache-tier:
root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup

Total time run: 301.236355
Total writes made:  6041
Write size: 4194304
Bandwidth (MB/sec): 80.216

Stddev Bandwidth:   10.5358
Max bandwidth (MB/sec): 104
Min bandwidth (MB/sec): 0
Average Latency:0.797838
Stddev Latency: 0.619098
Max latency:4.89823
Min latency:0.158543

root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
Total time run:133.563980
Total reads made: 6041
Read size:4194304
Bandwidth (MB/sec):180.917

Average Latency:   0.353559
Max latency:   1.83356
Min latency:   0.027878

after configure cache-tier:
root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage  
hotstorage

pool 'hotstorage' is now (or already was) a tier of 'coldstorage'

root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode  
hotstorage writeback

set cache-mode for pool 'hotstorage' to writeback

root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay  
coldstorage hotstorage

overlay for 'coldstorage' is now (or already was) 'hotstorage'

oot@ubuntu:~# ceph osd dump|grep storage
pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0  
object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216 flags  
hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0
pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1  
object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags  
hashpspool,incomplete_clones tier_of 6 cache_mode writeback target_bytes  
1000 hit_set bloom{false_positive_probability: 0.05, target_size:  
0, seed: 0} 3600s x6 stripe_width 0

-
rados bench -p coldstorage 300 write --no-cleanup
Total time run: 302.207573
Total writes made: 4315
Write size: 4194304
Bandwidth (MB/sec): 57.113

Stddev Bandwidth: 23.9375
Max bandwidth (MB/sec): 104
Min bandwidth (MB/sec): 0
Average Latency: 1.1204
Stddev Latency: 0.717092
Max latency: 6.97288
Min latency: 0.158371

root@ubuntu:/# rados bench -p coldstorage 300 seq
Total time run: 153.869741
Total reads made: 4315
Read size: 4194304
Bandwidth (MB/sec): 112.173

Average Latency: 0.570487
Max latency: 1.75137
Min latency: 0.039635


ceph.conf:

[global]
fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831
mon_initial_members = ceph2, ceph3, ceph4
mon_host = 10.**.**.241,10.**.**.242,10.**.**.243
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 3
osd_pool_default_min_size = 1
auth_supported = cephx
osd_journal_size = 10240
osd_mkfs_type = xfs
osd crush update on start = false

[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = false
rbd_cache_size = 33554432
rbd_cache_max_dirty = 25165824
rbd_cache_target_dirty = 16777216
rbd_cache_max_dirty_age = 1
rbd_cache_block_writes_upfront = false
[osd]
filestore_omap_header_cache_size = 4
filestore_fd_cache_size = 4
filestore_fiemap = true
client_readahead_min = 2097152
client_readahead_max_bytes = 0
client_readahead_max_periods = 4
filestore_journal_writeahead = false
filestore_max_sync_interval = 10
filestore_queue_max_ops = 500
filestore_queue_max_bytes = 1048576000
filestore_queue_committing_max_ops = 5000
filestore_queue_committing_max_bytes = 1048576000
keyvaluestore_queue_max_ops = 500
keyvaluestore_queue_max_bytes = 1048576000
journal_queue_max_ops = 3
journal_queue_max_bytes = 3355443200
osd_op_threads = 20
osd_disk_threads = 8
filestore_op_threads = 4
osd_mount_options_xfs = rw,noatime,nobarrier,inode64,logbsize=256k,delaylog

[mon]
mon_osd_allow_primary_affinity=true

--
使用Opera的电子邮件客户端:http://www.opera.com/mail/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd tree output

2016-01-08 Thread hnuzhoulin

Yeah,this setting can not see in asok config.
You just set it in ceph.conf and restart mon and osd service(sorry I  
forget if these restart is necessary)


what I use this config is when I changed crushmap manually,and I do not  
want the service init script to rebuild crushmap as default way.


maybe this is not siut for your problem.just have a try.

在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler   
写道:


That is not set as far as I can tell.  Actually it is strange that I  
don't see that setting at all.



[root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep  
crush


[root@cpn1 ~]# grep update /etc/ceph/ceph.conf

[root@cpn1 ~]#
On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen  wrote:



Hi,

Do you have by any chance disabled automatic crushmap updates in your  
ceph config?


osd crush update on start = false

If this is the case, and you move disks around hosts, they won't update  
their position/host in the crushmap, even if the crushmap does not  
reflect reality.

Regards,

Mart





On 01/08/2016 02:16 AM, Wade Holler wrote:
Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs  
per node,  but I will only include a sample:


ceph osd tree | head -35

ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 130.98450 root default
-2   5.82153 host cpn1   
 4   0.72769 osd.4  up  1.0  1.0

14   0.72769 osd.14 up  1.0  1.0
 3   0.72769 osd.3  up  1.0  1.0
24   0.72769 osd.24 up  1.0  1.0
 5   0.72769 osd.5  up  1.0  1.0
 2   0.72769 osd.2  up  1.0  1.0
17   0.72769 osd.17 up  1.0  1.0
69   0.72769 osd.69 up  1.0  1.0
-3   6.54922 host cpn3   
 7   0.72769 osd.7  up  1.0  1.0

 8   0.72769 osd.8  up  1.0  1.0
 9   0.72769 osd.9  up  1.0  1.0
 0   0.72769 osd.0  up  1.0  1.0
28   0.72769 osd.28 up  1.0  1.0
10   0.72769 osd.10 up  1.0  1.0
 1   0.72769 osd.1  up  1.0  1.0
 6   0.72769 osd.6  up  1.0  1.0
29   0.72769 osd.29 up  1.0  1.0
-4   2.91077 host cpn4   




Compared with the actual processes that are running:





[root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd


ceph   92638   1 26 16:19 ?01:00:55 /usr/bin/ceph-osd  
-f --cluster ceph --id 6 --setuser ceph --setgroup ceph


ceph   92667   1 20 16:19 ?00:48:04 /usr/bin/ceph-osd  
-f --cluster ceph --id 0 --setuser ceph --setgroup ceph


ceph   92673   1 18 16:19 ?00:42:48 /usr/bin/ceph-osd  
-f --cluster ceph --id 8 --setuser ceph --setgroup ceph


ceph   92681   1 19 16:19 ?00:45:52 /usr/bin/ceph-osd  
-f --cluster ceph --id 7 --setuser ceph --setgroup ceph


ceph   92701   1 15 16:19 ?00:36:05 /usr/bin/ceph-osd  
-f --cluster ceph --id 12 --setuser ceph --setgroup ceph


ceph   92748   1 14 16:19 ?00:34:07 /usr/bin/ceph-osd  
-f --cluster ceph --id 10 --setuser ceph --setgroup ceph


ceph   92756   1 16 16:19 ?00:38:40 /usr/bin/ceph-osd  
-f --cluster ceph --id 9 --setuser ceph --setgroup ceph


ceph   92758   1 17 16:19 ?00:39:28 /usr/bin/ceph-osd  
-f --cluster ceph --id 13 --setuser ceph --setgroup ceph


ceph   92777   1 19 16:19 ?00:46:17 /usr/bin/ceph-osd  
-f --cluster ceph --id 1 --setuser ceph --setgroup ceph


ceph   92988   1 18 16:19 ?00:42:47 /usr/bin/ceph-osd  
-f --cluster ceph --id 5 --setuser ceph --setgroup ceph


ceph   93058   1 18 16:19 ?00:43:18 /usr/bin/ceph-osd  
-f --cluster ceph --id 11 --setuser ceph --setgroup ceph


ceph   93078   1 17 16:19 ?00:41:38 /usr/bin/ceph-osd  
-f --cluster ceph --id 14 --setuser ceph --setgroup ceph


ceph   93127   1 15 16:19 ?00:36:29 /usr/bin/ceph-osd  
-f --cluster ceph --id 4 --setuser ceph --setgroup ceph


ceph   93130   1 17 16:19 ?00:40:44 /usr/bin/ceph-osd  
-f --cluster ceph --id 2 --setuser ceph --setgroup ceph


ceph   93173   1 21 16:19 ?00:49:37 /usr/bin/ceph-osd  
-f --cluster ceph --id 3 --setuser ceph --setgroup ceph


[root@cpx1 ~]# ssh cpn3 ps -ef | grep ceph\-osd

ceph   82454   1 18 16:19 ?00:43:58 /usr/bin/ceph-osd  
-f --cluster ceph --id 25 --setuser ceph --setgroup ceph


ceph   82464   1 24 16:19 ?00:55:40 /usr/bin/ceph-osd  
-f --

Re: [ceph-users] ceph osd tree output

2016-01-08 Thread Wade Holler
It is not set in the conf file.  So why do I still have this behavior ?

On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:

> Yeah,this setting can not see in asok config.
> You just set it in ceph.conf and restart mon and osd service(sorry I
> forget if these restart is necessary)
>
> what I use this config is when I changed crushmap manually,and I do not
> want the service init script to rebuild crushmap as default way.
>
> maybe this is not siut for your problem.just have a try.
>
> 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler  写道:
>
> That is not set as far as I can tell.  Actually it is strange that I don't
> see that setting at all.
>
> [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep
> crush
>
> [root@cpn1 ~]# grep update /etc/ceph/ceph.conf
>
> [root@cpn1 ~]#
>
> On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen  wrote:
>
>>
>>
>> Hi,
>>
>> Do you have by any chance disabled automatic crushmap updates in your
>> ceph config?
>>
>> osd crush update on start = false
>>
>> If this is the case, and you move disks around hosts, they won't update
>> their position/host in the crushmap, even if the crushmap does not reflect
>> reality.
>>
>> Regards,
>>
>> Mart
>>
>>
>>
>>
>>
>> On 01/08/2016 02:16 AM, Wade Holler wrote:
>>
>> Sure.  Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs per
>> node,  but I will only include a sample:
>>
>> ceph osd tree | head -35
>>
>> ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>
>>  -1 130.98450 root default
>>
>>  -2   5.82153 host cpn1
>>
>>   4   0.72769 osd.4  up  1.0  1.0
>>
>>  14   0.72769 osd.14 up  1.0  1.0
>>
>>   3   0.72769 osd.3  up  1.0  1.0
>>
>>  24   0.72769 osd.24 up  1.0  1.0
>>
>>   5   0.72769 osd.5  up  1.0  1.0
>>
>>   2   0.72769 osd.2  up  1.0  1.0
>>
>>  17   0.72769 osd.17 up  1.0  1.0
>>
>>  69   0.72769 osd.69 up  1.0  1.0
>>
>>  -3   6.54922 host cpn3
>>
>>   7   0.72769 osd.7  up  1.0  1.0
>>
>>   8   0.72769 osd.8  up  1.0  1.0
>>
>>   9   0.72769 osd.9  up  1.0  1.0
>>
>>   0   0.72769 osd.0  up  1.0  1.0
>>
>>  28   0.72769 osd.28 up  1.0  1.0
>>
>>  10   0.72769 osd.10 up  1.0  1.0
>>
>>   1   0.72769 osd.1  up  1.0  1.0
>>
>>   6   0.72769 osd.6  up  1.0  1.0
>>
>>  29   0.72769 osd.29 up  1.0  1.0
>>
>>  -4   2.91077 host cpn4
>>
>>
>> Compared with the actual processes that are running:
>>
>>
>> [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd
>>
>> ceph   92638   1 26 16:19 ?01:00:55 /usr/bin/ceph-osd -f
>> --cluster ceph --id 6 --setuser ceph --setgroup ceph
>>
>> ceph   92667   1 20 16:19 ?00:48:04 /usr/bin/ceph-osd -f
>> --cluster ceph --id 0 --setuser ceph --setgroup ceph
>>
>> ceph   92673   1 18 16:19 ?00:42:48 /usr/bin/ceph-osd -f
>> --cluster ceph --id 8 --setuser ceph --setgroup ceph
>>
>> ceph   92681   1 19 16:19 ?00:45:52 /usr/bin/ceph-osd -f
>> --cluster ceph --id 7 --setuser ceph --setgroup ceph
>>
>> ceph   92701   1 15 16:19 ?00:36:05 /usr/bin/ceph-osd -f
>> --cluster ceph --id 12 --setuser ceph --setgroup ceph
>>
>> ceph   92748   1 14 16:19 ?00:34:07 /usr/bin/ceph-osd -f
>> --cluster ceph --id 10 --setuser ceph --setgroup ceph
>>
>> ceph   92756   1 16 16:19 ?00:38:40 /usr/bin/ceph-osd -f
>> --cluster ceph --id 9 --setuser ceph --setgroup ceph
>>
>> ceph   92758   1 17 16:19 ?00:39:28 /usr/bin/ceph-osd -f
>> --cluster ceph --id 13 --setuser ceph --setgroup ceph
>>
>> ceph   92777   1 19 16:19 ?00:46:17 /usr/bin/ceph-osd -f
>> --cluster ceph --id 1 --setuser ceph --setgroup ceph
>>
>> ceph   92988   1 18 16:19 ?00:42:47 /usr/bin/ceph-osd -f
>> --cluster ceph --id 5 --setuser ceph --setgroup ceph
>>
>> ceph   93058   1 18 16:19 ?00:43:18 /usr/bin/ceph-osd -f
>> --cluster ceph --id 11 --setuser ceph --setgroup ceph
>>
>> ceph   93078   1 17 16:19 ?00:41:38 /usr/bin/ceph-osd -f
>> --cluster ceph --id 14 --setuser ceph --setgroup ceph
>>
>> ceph   93127   1 15 16:19 ?00:36:29 /usr/bin/ceph-osd -f
>> --cluster ceph --id 4 --setuser ceph --setgroup ceph
>>
>> ceph   93130   1 17 16:19 ?00:40:44 /usr/bin/ceph-osd -f
>> --cluster ceph --id 2 --setuser ceph --setgroup ceph
>>
>> ceph   93173   1 21 16:19 ?00:49:37 /usr/bin/ceph-osd -f
>> --cluster ceph --id 3 --set

Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-08 Thread Wade Holler
My experience is performance degrades dramatically when dirty objects are
flushed.

Best Regards,
Wade


On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:

> Hi,guyes
> Recentlly,I am testing  cache-tier using writeback mode.but I found a
> strange things.
> the performance  using rados bench degrade.Is it correct?
> If so,how to explain.following some info about my test:
>
> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the other
> one used as OSD),four sata as OSD.
>
> before using cache-tier:
> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup
> 
> Total time run: 301.236355
> Total writes made:  6041
> Write size: 4194304
> Bandwidth (MB/sec): 80.216
>
> Stddev Bandwidth:   10.5358
> Max bandwidth (MB/sec): 104
> Min bandwidth (MB/sec): 0
> Average Latency:0.797838
> Stddev Latency: 0.619098
> Max latency:4.89823
> Min latency:0.158543
>
> root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
> Total time run:133.563980
> Total reads made: 6041
> Read size:4194304
> Bandwidth (MB/sec):180.917
>
> Average Latency:   0.353559
> Max latency:   1.83356
> Min latency:   0.027878
>
> after configure cache-tier:
> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage
> hotstorage
> pool 'hotstorage' is now (or already was) a tier of 'coldstorage'
>
> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
> hotstorage writeback
> set cache-mode for pool 'hotstorage' to writeback
>
> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
> coldstorage hotstorage
> overlay for 'coldstorage' is now (or already was) 'hotstorage'
>
> oot@ubuntu:~# ceph osd dump|grep storage
> pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216 flags
> hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0
> pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags
> hashpspool,incomplete_clones tier_of 6 cache_mode writeback target_bytes
> 1000 hit_set bloom{false_positive_probability: 0.05, target_size:
> 0, seed: 0} 3600s x6 stripe_width 0
> -
> rados bench -p coldstorage 300 write --no-cleanup
> Total time run: 302.207573
> Total writes made: 4315
> Write size: 4194304
> Bandwidth (MB/sec): 57.113
>
> Stddev Bandwidth: 23.9375
> Max bandwidth (MB/sec): 104
> Min bandwidth (MB/sec): 0
> Average Latency: 1.1204
> Stddev Latency: 0.717092
> Max latency: 6.97288
> Min latency: 0.158371
>
> root@ubuntu:/# rados bench -p coldstorage 300 seq
> Total time run: 153.869741
> Total reads made: 4315
> Read size: 4194304
> Bandwidth (MB/sec): 112.173
>
> Average Latency: 0.570487
> Max latency: 1.75137
> Min latency: 0.039635
>
>
> ceph.conf:
> 
> [global]
> fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831
> mon_initial_members = ceph2, ceph3, ceph4
> mon_host = 10.**.**.241,10.**.**.242,10.**.**.243
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 3
> osd_pool_default_min_size = 1
> auth_supported = cephx
> osd_journal_size = 10240
> osd_mkfs_type = xfs
> osd crush update on start = false
>
> [client]
> rbd_cache = true
> rbd_cache_writethrough_until_flush = false
> rbd_cache_size = 33554432
> rbd_cache_max_dirty = 25165824
> rbd_cache_target_dirty = 16777216
> rbd_cache_max_dirty_age = 1
> rbd_cache_block_writes_upfront = false
> [osd]
> filestore_omap_header_cache_size = 4
> filestore_fd_cache_size = 4
> filestore_fiemap = true
> client_readahead_min = 2097152
> client_readahead_max_bytes = 0
> client_readahead_max_periods = 4
> filestore_journal_writeahead = false
> filestore_max_sync_interval = 10
> filestore_queue_max_ops = 500
> filestore_queue_max_bytes = 1048576000
> filestore_queue_committing_max_ops = 5000
> filestore_queue_committing_max_bytes = 1048576000
> keyvaluestore_queue_max_ops = 500
> keyvaluestore_queue_max_bytes = 1048576000
> journal_queue_max_ops = 3
> journal_queue_max_bytes = 3355443200
> osd_op_threads = 20
> osd_disk_threads = 8
> filestore_op_threads = 4
> osd_mount_options_xfs = rw,noatime,nobarrier,inode64,logbsize=256k,delaylog
>
> [mon]
> mon_osd_allow_primary_affinity=true
>
> --
> 使用Opera的电子邮件客户端:http://www.opera.com/mail/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-08 Thread Nick Fisk
There was/is a bug in Infernalis and older, where objects will always get 
promoted on the 2nd read/write regardless of what you set the 
min_recency_promote settings to. This can have a dramatic effect on 
performance. I wonder if this is what you are experiencing?

This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 . 

You can compile the changes above to see if it helps or I have a .deb for 
Infernalis where this is fixed if it's easier.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Wade Holler
> Sent: 08 January 2016 16:14
> To: hnuzhoulin ; ceph-de...@vger.kernel.org
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench
> result degrade
> 
> My experience is performance degrades dramatically when dirty objects are
> flushed.
> 
> Best Regards,
> Wade
> 
> 
> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
> Hi,guyes
> Recentlly,I am testing  cache-tier using writeback mode.but I found a
> strange things.
> the performance  using rados bench degrade.Is it correct?
> If so,how to explain.following some info about my test:
> 
> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the
> other
> one used as OSD),four sata as OSD.
> 
> before using cache-tier:
> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup
> 
> Total time run: 301.236355
> Total writes made:  6041
> Write size: 4194304
> Bandwidth (MB/sec): 80.216
> 
> Stddev Bandwidth:   10.5358
> Max bandwidth (MB/sec): 104
> Min bandwidth (MB/sec): 0
> Average Latency:0.797838
> Stddev Latency: 0.619098
> Max latency:4.89823
> Min latency:0.158543
> 
> root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
> Total time run:133.563980
> Total reads made: 6041
> Read size:4194304
> Bandwidth (MB/sec):180.917
> 
> Average Latency:   0.353559
> Max latency:   1.83356
> Min latency:   0.027878
> 
> after configure cache-tier:
> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage
> hotstorage
> pool 'hotstorage' is now (or already was) a tier of 'coldstorage'
> 
> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
> hotstorage writeback
> set cache-mode for pool 'hotstorage' to writeback
> 
> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
> coldstorage hotstorage
> overlay for 'coldstorage' is now (or already was) 'hotstorage'
> 
> oot@ubuntu:~# ceph osd dump|grep storage
> pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216
> flags
> hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0
> pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags
> hashpspool,incomplete_clones tier_of 6 cache_mode writeback
> target_bytes
> 1000 hit_set bloom{false_positive_probability: 0.05, target_size:
> 0, seed: 0} 3600s x6 stripe_width 0
> -
> rados bench -p coldstorage 300 write --no-cleanup
> Total time run: 302.207573
> Total writes made: 4315
> Write size: 4194304
> Bandwidth (MB/sec): 57.113
> 
> Stddev Bandwidth: 23.9375
> Max bandwidth (MB/sec): 104
> Min bandwidth (MB/sec): 0
> Average Latency: 1.1204
> Stddev Latency: 0.717092
> Max latency: 6.97288
> Min latency: 0.158371
> 
> root@ubuntu:/# rados bench -p coldstorage 300 seq
> Total time run: 153.869741
> Total reads made: 4315
> Read size: 4194304
> Bandwidth (MB/sec): 112.173
> 
> Average Latency: 0.570487
> Max latency: 1.75137
> Min latency: 0.039635
> 
> 
> ceph.conf:
> 
> [global]
> fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831
> mon_initial_members = ceph2, ceph3, ceph4
> mon_host = 10.**.**.241,10.**.**.242,10.**.**.243
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 3
> osd_pool_default_min_size = 1
> auth_supported = cephx
> osd_journal_size = 10240
> osd_mkfs_type = xfs
> osd crush update on start = false
> 
> [client]
> rbd_cache = true
> rbd_cache_writethrough_until_flush = false
> rbd_cache_size = 33554432
> rbd_cache_max_dirty = 25165824
> rbd_cache_target_dirty = 16777216
> rbd_cache_max_dirty_age = 1
> rbd_cache_block_writes_upfront = false
> [osd]
> filestore_omap_header_cache_size = 4
> filestore_fd_cache_size = 4
> filestore_fiemap = true
> client_readahead_min = 2097152
> client_readahead_max_bytes = 0
> client_readahead_max_periods = 4
> filestore_journal_writeahead = false
> filestore_max_sync_interval = 10
> filestore_queue_max_ops = 500
> filestore_queue_max_bytes = 1048576000
> filestore_queue_committin

[ceph-users] Infernalis

2016-01-08 Thread HEWLETT, Paul (Paul)
Hi Cephers

Just fired up first Infernalis cluster on RHEL7.1.

The following:

[root@citrus ~]# systemctl status ceph-osd@0.service
ceph-osd@0.service - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled)
   Active: active (running) since Fri 2016-01-08 15:57:11 GMT; 1h 8min ago
 Main PID: 7578 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
   └─7578 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph 
--setgroup ceph

Jan 08 15:57:10 citrus.arch.velocix.com systemd[1]: Starting Ceph object 
storage daemon...
Jan 08 15:57:10 citrus.arch.velocix.com ceph-osd-prestart.sh[7520]: getopt: 
unrecognized option '--setuser'
Jan 08 15:57:10 citrus.arch.velocix.com ceph-osd-prestart.sh[7520]: getopt: 
unrecognized option '--setgroup'
Jan 08 15:57:11 citrus.arch.velocix.com ceph-osd-prestart.sh[7520]: 
create-or-move updating item name 'osd.0' weight 0.2678 at location 
{host=citrus,root=default} to crush map
Jan 08 15:57:11 citrus.arch.velocix.com systemd[1]: Started Ceph object storage 
daemon.
Jan 08 15:57:11 citrus.arch.velocix.com ceph-osd[7578]: starting osd.0 at :/0 
osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Jan 08 15:57:11 citrus.arch.velocix.com ceph-osd[7578]: 2016-01-08 
15:57:11.743134 7f61ee37e900 -1 osd.0 0 log_to_monitors {default=true}
Jan 08 15:57:11 citrus.arch.velocix.com systemd[1]: Started Ceph object storage 
daemon.
Jan 08 15:57:12 citrus.arch.velocix.com systemd[1]: Started Ceph object storage 
daemon.
Jan 08 15:57:12 citrus.arch.velocix.com systemd[1]: Started Ceph object storage 
daemon.
Jan 08 15:57:12 citrus.arch.velocix.com systemd[1]: Started Ceph object storage 
daemon.
Jan 08 15:57:14 citrus.arch.velocix.com systemd[1]: Started Ceph object storage 
daemon.

Shows some warnings:

   - setuser unrecognised option (and setgroup) - Is this an error?
   - why 5 msgs about starting the Ceph object storage daemon? Is this also 
an error of some kind?

Paul
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-08 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Are you backporting that to hammer? We'd love it.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Jan 8, 2016 at 9:28 AM, Nick Fisk  wrote:
> There was/is a bug in Infernalis and older, where objects will always get 
> promoted on the 2nd read/write regardless of what you set the 
> min_recency_promote settings to. This can have a dramatic effect on 
> performance. I wonder if this is what you are experiencing?
>
> This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 .
>
> You can compile the changes above to see if it helps or I have a .deb for 
> Infernalis where this is fixed if it's easier.
>
> Nick
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Wade Holler
>> Sent: 08 January 2016 16:14
>> To: hnuzhoulin ; ceph-de...@vger.kernel.org
>> Cc: ceph-us...@ceph.com
>> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench
>> result degrade
>>
>> My experience is performance degrades dramatically when dirty objects are
>> flushed.
>>
>> Best Regards,
>> Wade
>>
>>
>> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin  wrote:
>> Hi,guyes
>> Recentlly,I am testing  cache-tier using writeback mode.but I found a
>> strange things.
>> the performance  using rados bench degrade.Is it correct?
>> If so,how to explain.following some info about my test:
>>
>> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the
>> other
>> one used as OSD),four sata as OSD.
>>
>> before using cache-tier:
>> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup
>> 
>> Total time run: 301.236355
>> Total writes made:  6041
>> Write size: 4194304
>> Bandwidth (MB/sec): 80.216
>>
>> Stddev Bandwidth:   10.5358
>> Max bandwidth (MB/sec): 104
>> Min bandwidth (MB/sec): 0
>> Average Latency:0.797838
>> Stddev Latency: 0.619098
>> Max latency:4.89823
>> Min latency:0.158543
>>
>> root@ceph1:/root/cluster# rados bench -p coldstorage  300 seq
>> Total time run:133.563980
>> Total reads made: 6041
>> Read size:4194304
>> Bandwidth (MB/sec):180.917
>>
>> Average Latency:   0.353559
>> Max latency:   1.83356
>> Min latency:   0.027878
>>
>> after configure cache-tier:
>> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage
>> hotstorage
>> pool 'hotstorage' is now (or already was) a tier of 'coldstorage'
>>
>> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode
>> hotstorage writeback
>> set cache-mode for pool 'hotstorage' to writeback
>>
>> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay
>> coldstorage hotstorage
>> overlay for 'coldstorage' is now (or already was) 'hotstorage'
>>
>> oot@ubuntu:~# ceph osd dump|grep storage
>> pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216
>> flags
>> hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0
>> pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags
>> hashpspool,incomplete_clones tier_of 6 cache_mode writeback
>> target_bytes
>> 1000 hit_set bloom{false_positive_probability: 0.05, target_size:
>> 0, seed: 0} 3600s x6 stripe_width 0
>> -
>> rados bench -p coldstorage 300 write --no-cleanup
>> Total time run: 302.207573
>> Total writes made: 4315
>> Write size: 4194304
>> Bandwidth (MB/sec): 57.113
>>
>> Stddev Bandwidth: 23.9375
>> Max bandwidth (MB/sec): 104
>> Min bandwidth (MB/sec): 0
>> Average Latency: 1.1204
>> Stddev Latency: 0.717092
>> Max latency: 6.97288
>> Min latency: 0.158371
>>
>> root@ubuntu:/# rados bench -p coldstorage 300 seq
>> Total time run: 153.869741
>> Total reads made: 4315
>> Read size: 4194304
>> Bandwidth (MB/sec): 112.173
>>
>> Average Latency: 0.570487
>> Max latency: 1.75137
>> Min latency: 0.039635
>>
>>
>> ceph.conf:
>> 
>> [global]
>> fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831
>> mon_initial_members = ceph2, ceph3, ceph4
>> mon_host = 10.**.**.241,10.**.**.242,10.**.**.243
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>> osd_pool_default_size = 3
>> osd_pool_default_min_size = 1
>> auth_supported = cephx
>> osd_journal_size = 10240
>> osd_mkfs_type = xfs
>> osd crush update on start = false
>>
>> [client]
>> rbd_cache = true
>> rbd_cache_writethrough_until_flush = false
>> rbd_cache_size = 33554432
>> rbd_cache_max_dirty = 25165824
>> rbd_cache_target_dirty = 16777216
>> rbd_cache_max_dirty_age = 1
>> rbd_cache_block_writes_upfront = false
>> [os

Re: [ceph-users] Unable to see LTTng tracepoints in Ceph

2016-01-08 Thread Jason Dillaman
Have you started ceph-osd with LD_PRELOAD=/usr/lib64/liblttng-ust-fork.so 
[matched to correct OS path]?  I just tested ceph-osd on the master branch and 
was able to generate OSD trace events.  You should also make sure that AppArmor 
/ SElinux isn't denying access to /dev/shm/lttng-ust-*.

What tracing events do you see being generated from ceph-mon?  I didn't realize 
it had any registered tracepoint events.

-- 

Jason Dillaman 


- Original Message - 

> From: "Aakanksha Pudipeddi-SSI" 
> To: ceph-users@lists.ceph.com
> Sent: Wednesday, January 6, 2016 10:36:13 PM
> Subject: [ceph-users] Unable to see LTTng tracepoints in Ceph

> Hello Cephers,

> A very happy new year to you all!

> I wanted to enable LTTng tracepoints for a few tests with infernalis and
> configured Ceph with the –with-lttng option. Seeing a recent post on conf
> file options for tracing, I added these lines:

> osd_tracing = true
> osd_objectstore_tracing = true
> rados_tracing = true
> rbd_tracing = true

> However, I am unable to see LTTng tracepoints within ceph-osd. I can see
> tracepoints in ceph-mon though. The main difference with respect to tracing
> between ceph-mon and ceph-osd seems to be TracepointProvider and I thought
> the addition in my config file should do the trick but that didn’t change
> anything. I do not know if this is relevant but I also checked with lsof and
> I see ceph-osd is accessing the lttng library as is ceph-mon. Did anyone
> come across this issue and if so, could you give me some direction on this?
> Thanks a lot for your help!

> Aakanksha

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg is stuck stale (osd.21 still removed)

2016-01-08 Thread Daniel Schwager
One more - I tried to recreate the pg but now this pg this "stuck inactive":

root@ceph-admin:~# ceph pg force_create_pg 34.225
pg 34.225 now creating, ok

root@ceph-admin:~# ceph health detail
HEALTH_WARN 49 pgs stale; 1 pgs stuck inactive; 49 pgs stuck stale; 1 
pgs stuck unclean
pg 34.225 is stuck inactive since forever, current state creating, last 
acting []
pg 34.225 is stuck unclean since forever, current state creating, last 
acting []
pg 34.186 is stuck stale for 118481.013632, current state 
stale+active+clean, last acting [21]
...

Maybe somebody has an idea how to fix this situation?

regards
Danny




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [Ceph-Users] The best practice, ""ceph.conf""

2016-01-08 Thread Shinobu Kinjo
Hello,

Since ""ceph.conf"" is getting more complicated because there has been a bunch 
of parameters.

It's because of bug fixes, performance optimization or whatever making the Ceph 
cluster more strong, stable and something.

I'm pretty sure that I have not been able to catch up -;

 [ceph@ceph01 ~]$ ceph --version
 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 [ceph@ceph01 ~]$ ceph --show-config | wc -l
 840

 [ceph@ceph-stack src]$ ./ceph --show-config | wc -l
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 2016-01-08 18:05:15.195421 7fbce7462700 -1 WARNING: the following dangerous 
and experimental features are enabled: *
 946

I know it depends but I would like to be suggested to about what is the best 
practice of setting of ""ceph.conf"".

And where should I go to get specific explanation of each parameter, and know 
exactly what each parameter means, and how it works.

Rgds,
Shinobu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] very high OSD RAM usage values

2016-01-08 Thread Josef Johansson
Hi,

I would say this is normal. 1GB of ram per 1TB is what we designed the
cluster for, I would believe that an EC-pool demands a lot more. Buy more
ram and start everything 32GB ram is quite little, when the cluster is
operating OK you'll see that extra ram getting used as file cache which
makes the cluster faster.

Regards,
Josef
On 6 Jan 2016 12:12, "Kenneth Waegeman"  wrote:

> Hi all,
>
> We experienced some serious trouble with our cluster: A running cluster
> started failing and started a chain reaction until the ceph cluster was
> down, as about half the OSDs are down (in a EC pool)
>
> Each host has 8 OSDS of 8 TB (i.e. RAID 0 of 2 4TB disk) for an EC pool
> (10+3, 14 hosts) and 2 cache OSDS and 32 GB of RAM.
> The reason we have the Raid0 of the disks, is because we tried with 16
> disk before, but 32GB didn't seem enough to keep the cluster stable
>
> We don't know for sure what triggered the chain reaction, but what we
> certainly see, is that while recovering, our OSDS are using a lot of
> memory. We've seen some OSDS using almost 8GB of RAM (resident; virtual
> 11GB)
> So right now we don't have enough memory to recover the cluster, because
> the  OSDS  get killed by OOMkiller before they can recover..
> And I don't know doubling our memory will be enough..
>
> A few questions:
>
> * Does someone has seen this before?
> * 2GB was still normal, but 8GB seems a lot, is this expected behaviour?
> * We didn't see this with an nearly empty cluster. Now it was filled about
> 1/4 (270TB). I guess it would become worse when filled half or more?
> * How high can this memory usage become ? Can we calculate the maximum
> memory of an OSD? Can we limit it ?
> * We can upgrade/reinstall to infernalis, will that solve anything?
>
> This is related to a previous post of me :
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22259
>
>
> Thank you very much !!
>
> Kenneth
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] very high OSD RAM usage values

2016-01-08 Thread Josef Johansson
Maybe changing the number of concurrent back fills could limit the memory
usage.
On 9 Jan 2016 05:52, "Josef Johansson"  wrote:

> Hi,
>
> I would say this is normal. 1GB of ram per 1TB is what we designed the
> cluster for, I would believe that an EC-pool demands a lot more. Buy more
> ram and start everything 32GB ram is quite little, when the cluster is
> operating OK you'll see that extra ram getting used as file cache which
> makes the cluster faster.
>
> Regards,
> Josef
> On 6 Jan 2016 12:12, "Kenneth Waegeman"  wrote:
>
>> Hi all,
>>
>> We experienced some serious trouble with our cluster: A running cluster
>> started failing and started a chain reaction until the ceph cluster was
>> down, as about half the OSDs are down (in a EC pool)
>>
>> Each host has 8 OSDS of 8 TB (i.e. RAID 0 of 2 4TB disk) for an EC pool
>> (10+3, 14 hosts) and 2 cache OSDS and 32 GB of RAM.
>> The reason we have the Raid0 of the disks, is because we tried with 16
>> disk before, but 32GB didn't seem enough to keep the cluster stable
>>
>> We don't know for sure what triggered the chain reaction, but what we
>> certainly see, is that while recovering, our OSDS are using a lot of
>> memory. We've seen some OSDS using almost 8GB of RAM (resident; virtual
>> 11GB)
>> So right now we don't have enough memory to recover the cluster, because
>> the  OSDS  get killed by OOMkiller before they can recover..
>> And I don't know doubling our memory will be enough..
>>
>> A few questions:
>>
>> * Does someone has seen this before?
>> * 2GB was still normal, but 8GB seems a lot, is this expected behaviour?
>> * We didn't see this with an nearly empty cluster. Now it was filled
>> about 1/4 (270TB). I guess it would become worse when filled half or more?
>> * How high can this memory usage become ? Can we calculate the maximum
>> memory of an OSD? Can we limit it ?
>> * We can upgrade/reinstall to infernalis, will that solve anything?
>>
>> This is related to a previous post of me :
>> http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22259
>>
>>
>> Thank you very much !!
>>
>> Kenneth
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com