Re: [ceph-users] Ceph.com

2015-04-17 Thread Wido den Hollander


On 16-04-15 19:31, Ferber, Dan wrote:
> 
> Thanks for working on this Patrick. I have looked for a mirror that I can 
> point all the ceph.com references to in 
> /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py. So I 
> can get ceph-deploy to work.
> 
> I tried eu.ceph.com but it does not work for this
> 

Do you know what is missing? Since I want that to work. I'll make sure
that get's synced.

Wido

> Dan Ferber
> Software  Defined Storage
> +1 651-344-1846
> dan.fer...@intel.com
> 
> From: Patrick McGarry mailto:pmcga...@redhat.com>>
> Date: Thursday, April 16, 2015 at 10:28 AM
> To: Ceph Devel 
> mailto:ceph-de...@vger.kernel.org>>, Ceph-User 
> mailto:ceph-us...@ceph.com>>
> Subject: [ceph-users] Ceph.com
> 
> Hey cephers,
> 
> As most of you have no doubt noticed, ceph.com has been having
> some...er..."issues" lately. Unfortunately this is some of the
> holdover infrastructure stuff from being a startup without a big-boy
> ops plan.
> 
> The current setup has ceph.com sharing a host with some of the nightly
> build stuff to make it easier for gitbuilder tasks (that also build
> the website doc) to coexist. Was this smart? No, probably not. Was is
> the quick-and-dirty way for us to get stuff rolling when we were tiny?
> Yep.
> 
> So, now that things are continuing to grow (website traffic load,
> ceph-deploy key requests, number of simultaneous builds) we are
> hitting the end of what one hard-working box can handle. I am in the
> process of moving ceph.com to a new host so that build explosions wont
> slag things like Ceph Day pages and the blog, but the doc may lag
> behind a bit.
> 
> Hopefully since I'm starting with the website it wont hose up too many
> of the other tasks, but bear with us while we split routing for a bit.
> If you have any questions please feel free to poke me. Thanks.
> 
> --
> 
> Best Regards,
> 
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph.com

2015-04-17 Thread Kurt Bauer


Ferber, Dan wrote:
> Thanks for working on this Patrick. I have looked for a mirror that I can 
> point all the ceph.com references to in 
> /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py. So I 
> can get ceph-deploy to work.
> 
> I tried eu.ceph.com but it does not work for this

you tried smth like "ceph-deploy install --repo-url
http://eu.ceph.com/debian-hammer/"; (of course slightly different URL for
centos)?
Worked without problem for me while the ceph.com problems. Key has to be
fetched from ceph.com though, but that worked eventually, as it's a
rather small file.

best regards,
Kurt

> 
> Dan Ferber
> Software  Defined Storage
> +1 651-344-1846
> dan.fer...@intel.com
> 
> From: Patrick McGarry mailto:pmcga...@redhat.com>>
> Date: Thursday, April 16, 2015 at 10:28 AM
> To: Ceph Devel 
> mailto:ceph-de...@vger.kernel.org>>, Ceph-User 
> mailto:ceph-us...@ceph.com>>
> Subject: [ceph-users] Ceph.com
> 
> Hey cephers,
> 
> As most of you have no doubt noticed, ceph.com has been having
> some...er..."issues" lately. Unfortunately this is some of the
> holdover infrastructure stuff from being a startup without a big-boy
> ops plan.
> 
> The current setup has ceph.com sharing a host with some of the nightly
> build stuff to make it easier for gitbuilder tasks (that also build
> the website doc) to coexist. Was this smart? No, probably not. Was is
> the quick-and-dirty way for us to get stuff rolling when we were tiny?
> Yep.
> 
> So, now that things are continuing to grow (website traffic load,
> ceph-deploy key requests, number of simultaneous builds) we are
> hitting the end of what one hard-working box can handle. I am in the
> process of moving ceph.com to a new host so that build explosions wont
> slag things like Ceph Day pages and the blog, but the doc may lag
> behind a bit.
> 
> Hopefully since I'm starting with the website it wont hose up too many
> of the other tasks, but bear with us while we split routing for a bit.
> If you have any questions please feel free to poke me. Thanks.
> 
> --
> 
> Best Regards,
> 
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Kurt Bauer 
Vienna University Computer Center - ACOnet - VIX
Universitaetsstrasse 7, A-1010 Vienna, Austria, Europe
Tel: ++43 1 4277 - 14070 (Fax: - 814070)  KB1970-RIPE
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Hi guys,

I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
ceph rebalanced etc.

Now I have new SSD inside, and I will partition it etc - but would like to
know, how to proceed now, with the journal recreation for those 6 OSDs that
are down now.

Should I flush journal (where to, journals doesnt still exist...?), or just
recreate journal from scratch (making symboliv links again: ln -s
/dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

I expect the folowing procedure, but would like confirmation please:

rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
ceph-osd -i $ID --mkjournal
ll /var/lib/ceph/osd/ceph-$ID/journal
service ceph start osd.$ID

Any thought greatly appreciated !

Thanks,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPHFS with erasure code

2015-04-17 Thread MEGATEL / Rafał Gawron
Hello

I would create cephfs with erasure code.
I define my default ec-profile:

ceph osd erasure-code-profile get default
directory=/usr/lib64/ceph/erasure-code
k=3
m=1
plugin=jerasure
ruleset-failure-domain=host
technique=reed_sol_van

How I can create cephfs with this profile ?

I try create fsdata and fsmetadata with erasure but if I use this pool in 
create cephfs I have error:
ceph fs new cephfs fsmetadata fsdata
Error EINVAL: pool 'fsdata' (id '1') is an erasure-code pool

How I can do this ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPHFS with erasure code

2015-04-17 Thread Loic Dachary
Hi,

You should set a cache tier for CephFS to use and have the erasure coded pool 
behind it. You will find detailed informations at 
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

Cheers

On 17/04/2015 12:39, MEGATEL / Rafał Gawron wrote:
> Hello
> 
> I would create cephfs with erasure code.
> I define my default ec-profile:
> 
> 
> ceph osd erasure-code-profile get default
> 
> directory=/usr/lib64/ceph/erasure-code
> 
> k=3
> 
> m=1
> 
> plugin=jerasure
> 
> ruleset-failure-domain=host
> 
> technique=reed_sol_van
> 
> 
> How I can create cephfs with this profile ?
> 
> I try create fsdata and fsmetadata with erasure but if I use this pool in 
> create cephfs I have error:
> ceph fs new cephfs fsmetadata fsdata
> 
> Error EINVAL: pool 'fsdata' (id '1') is an erasure-code pool
> 
> How I can do this ?
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] metadata management in case of ceph object storage and ceph block storage

2015-04-17 Thread Steffen W Sørensen

> On 17/04/2015, at 07.33, Josef Johansson  wrote:
> 
> To your question, which I’m not sure I understand completely.
> 
> So yes, you don’t need the MDS if you just keep track of block storage and 
> object storage. (i.e. images for KVM)
> 
> So the Mon keeps track of the metadata for the Pool and PG
Well there really ain’t no metadata at all as with a traditional File System, 
monitors keep track of status of OSDs. Client compute which OSDs to go talk to 
to get to wanted objects. thus no need for central meta data service to tell 
clients where data are stored. Ceph is a distributed object storage system with 
potential no SPF and ability to scale out.
Try studying Ross’ slides f.ex. here:
http://www.slideshare.net/buildacloud/ceph-intro-and-architectural-overview-by-ross-turk
 

or many other good intros on the net, youtube etc.

Clients of a Ceph Cluster can access ‘objects’ (blobs with data) through 
several means, programatic with librados, as virtual block devices through 
librbd+librados, and finally as a S3 service through rados GW over http[s]  the 
meta data (users + ACLs, buckets+data…) for S3 objects are stored in various 
pools in Ceph.

CephFS built on top of a Ceph object store can best be compared with 
combination of a POSIX File System and other Networked File Systems f.ex. 
NFS,CiFS, AFP, only with a different protocol + access mean (FUSE daemon or 
kernel module). As it implements a regular file name space, it needs to store 
meta data of which files exist in such a name space, this is the job of the MDS 
server(s) which of course uses Ceph object store pools to persistent store this 
file system meta data info


> and the MDS keep track of all the files, hence the MDS should have at least 
> 10x the memory of what the Mon have.
Hmm 10x memory isn’t a rule of thumb in my book, it all depends of use case at 
hand.
MDS tracks meta data of files stored in a CephFS, which usually is far from all 
data of a cluster unless CephFS is the only usage of course :)
Many use Ceph for sharing virtual block devices among multiple Hypervisors as 
disk devices for virtual machines (VM images), f.ex. with Openstack, Proxmox 
etc.
 
> 
> I’m no Ceph expert, especially not on CephFS, but this is my picture of it :)
> 
> Maybe the architecture docs could help you out? 
> http://docs.ceph.com/docs/master/architecture/#cluster-map 
> 
> 
> Hope that resolves your question.
> 
> Cheers,
> Josef
> 
>> On 06 Apr 2015, at 18:51, pragya jain > > wrote:
>> 
>> Please somebody reply my queries.
>> 
>> Thank yuo
>>  
>> -
>> Regards
>> Pragya Jain
>> Department of Computer Science
>> University of Delhi
>> Delhi, India
>> 
>> 
>> 
>> On Saturday, 4 April 2015 3:24 PM, pragya jain > > wrote:
>> 
>> 
>> hello all!
>> 
>> As the documentation said "One of the unique features of Ceph is that it 
>> decouples data and metadata".
>> for applying the mechanism of decoupling, Ceph uses Metadata Server (MDS) 
>> cluster.
>> MDS cluster manages metadata operations, like open or rename a file
>> 
>> On the other hand, Ceph implementation for object storage as a service and 
>> block storage as a service does not require MDS implementation.
>> 
>> My question is:
>> In case of object storage and block storage, how does Ceph manage the 
>> metadata?
>> 
>> Please help me to understand this concept more clearly.
>> 
>> Thank you
>>  
>> -
>> Regards
>> Pragya Jain
>> Department of Computer Science
>> University of Delhi
>> Delhi, India
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Steffen W Sørensen
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph 
> rebalanced etc.
> 
> Now I have new SSD inside, and I will partition it etc - but would like to 
> know, how to proceed now, with the journal recreation for those 6 OSDs that 
> are down now.
Well assuming the OSDs are downwith journal device lost and as data have been 
rebalanced/re-replicated else where. I would assume scratch these 6x downed+out 
OSD+journal and rebuilt 6 new OSD and add such to cluster capacity after 
properly maintaining the CRUSH map remove the crashes OSDs.

> 
> Should I flush journal (where to, journals doesnt still exist...?), or just 
> recreate journal from scratch (making symboliv links again: ln -s 
> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
> 
> I expect the folowing procedure, but would like confirmation please:
> 
> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
> ceph-osd -i $ID --mkjournal
> ll /var/lib/ceph/osd/ceph-$ID/journal
> service ceph start osd.$ID
> 
> Any thought greatly appreciated !
> 
> Thanks,
> 
> -- 
> 
> Andrija Panić
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-04-17 Thread Georgios Dimitrakakis

Hi!

Do you by any chance have your OSDs placed at a local directory path 
rather than on a non utilized physical disk?


If I remember correctly from a similar setup that I had performed in 
the past the "ceph df" command accounts for the entire disk and not just 
for the OSD data directory. I am not sure if this still applies since it 
was on an early Firefly release but it is something that it's easy to 
look for.


I don't know if the above make sense but what I mean is that if for 
instance your OSD are at something like /var/lib/ceph/osd.X (or 
whatever) and this doesn't correspond to mounted a device (e.g. 
/dev/sdc1) but are local on the disk that provides the / or /var 
partition then you should do a "df -h" to see what the amount of data 
are on that partition and compare it with the "ceph df" output. It 
should be (more or less) the same.


Best,

George



2015-03-27 18:27 GMT+01:00 Gregory Farnum :
Ceph has per-pg and per-OSD metadata overhead. You currently have 
26000 PGs,
suitable for use on a cluster of the order of 260 OSDs. You have 
placed

almost 7GB of data into it (21GB replicated) and have about 7GB of
additional overhead.

You might try putting a suitable amount of data into the cluster 
before

worrying about the ratio of space used to data stored. :)
-Greg


Hello Greg,

I put a suitable amount of data now, and it looks like my ratio is
still 1 to 5.
The folder:
/var/lib/ceph/osd/ceph-N/current/meta/
did not grow, so it looks like that is not the problem.

Do you have any hint how to troubleshoot this issue ???


ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets size
size: 3
ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets min_size
min_size: 2


ansible@zrh-srv-m-cph02:~$ ceph -w
cluster 4179fcec-b336-41a1-a7fd-4a19a75420ea
 health HEALTH_WARN pool .rgw.buckets has too few pgs
 monmap e4: 4 mons at

{rml-srv-m-cph01=10.120.50.20:6789/0,rml-srv-m-cph02=10.120.50.21:6789/0,rml-srv-m-stk03=10.120.50.32:6789/0,zrh-srv-m-cph02=10.120.50.2:6789/0},
election epoch 668, quorum 0,1,2,3
zrh-srv-m-cph02,rml-srv-m-cph01,rml-srv-m-cph02,rml-srv-m-stk03
 osdmap e2170: 54 osds: 54 up, 54 in
  pgmap v619041: 28684 pgs, 15 pools, 109 GB data, 7358 kobjects
518 GB used, 49756 GB / 50275 GB avail
   28684 active+clean

ansible@zrh-srv-m-cph02:~$ ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
50275G 49756G 518G  1.03
POOLS:
NAME   ID USED  %USED MAX AVAIL 
OBJECTS
rbd0155 016461G   
   2
gianfranco 7156 016461G   
   2
images 8   257M 016461G   
  38
.rgw.root  9840 016461G   
   3
.rgw.control   10 0 016461G   
   8
.rgw   11 21334 016461G   
 108
.rgw.gc12 0 016461G   
  32
.users.uid 13  1575 016461G   
   6
.users 1472 016461G   
   6
.rgw.buckets.index 15 0 016461G   
  30
.users.swift   1736 016461G   
   3
.rgw.buckets   18  108G  0.2216461G 
7534745
.intent-log19 0 016461G   
   0
.rgw.buckets.extra 20 0 016461G   
   0
volumes21  512M 016461G   
 161

ansible@zrh-srv-m-cph02:~$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-04-17 Thread Saverio Proto
> Do you by any chance have your OSDs placed at a local directory path rather
> than on a non utilized physical disk?

No, I have 18 Disks per Server. Each OSD is mapped to a physical disk.

Here in the output of one server:
ansible@zrh-srv-m-cph02:~$ df -h
Filesystem   Size  Used Avail Use% Mounted on
/dev/mapper/vg01-root 28G  4.5G   22G  18% /
none 4.0K 0  4.0K   0% /sys/fs/cgroup
udev  48G  4.0K   48G   1% /dev
tmpfs9.5G  1.3M  9.5G   1% /run
none 5.0M 0  5.0M   0% /run/lock
none  48G   20K   48G   1% /run/shm
none 100M 0  100M   0% /run/user
/dev/mapper/vg01-tmp 4.5G  9.4M  4.3G   1% /tmp
/dev/mapper/vg01-varlog  9.1G  5.1G  3.6G  59% /var/log
/dev/sdf1932G   15G  917G   2% /var/lib/ceph/osd/ceph-3
/dev/sdg1932G   15G  917G   2% /var/lib/ceph/osd/ceph-4
/dev/sdl1932G   13G  919G   2% /var/lib/ceph/osd/ceph-8
/dev/sdo1932G   15G  917G   2% /var/lib/ceph/osd/ceph-11
/dev/sde1932G   15G  917G   2% /var/lib/ceph/osd/ceph-2
/dev/sdd1932G   15G  917G   2% /var/lib/ceph/osd/ceph-1
/dev/sdt1932G   15G  917G   2% /var/lib/ceph/osd/ceph-15
/dev/sdq1932G   12G  920G   2% /var/lib/ceph/osd/ceph-12
/dev/sdc1932G   14G  918G   2% /var/lib/ceph/osd/ceph-0
/dev/sds1932G   17G  916G   2% /var/lib/ceph/osd/ceph-14
/dev/sdu1932G   14G  918G   2% /var/lib/ceph/osd/ceph-16
/dev/sdm1932G   15G  917G   2% /var/lib/ceph/osd/ceph-9
/dev/sdk1932G   17G  915G   2% /var/lib/ceph/osd/ceph-7
/dev/sdn1932G   14G  918G   2% /var/lib/ceph/osd/ceph-10
/dev/sdr1932G   15G  917G   2% /var/lib/ceph/osd/ceph-13
/dev/sdv1932G   14G  918G   2% /var/lib/ceph/osd/ceph-17
/dev/sdh1932G   17G  916G   2% /var/lib/ceph/osd/ceph-5
/dev/sdj1932G   14G  918G   2% /var/lib/ceph/osd/ceph-30
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy : systemd unit files not deployed to a centos7 nodes

2015-04-17 Thread Alexandre DERUMIER
Hi,

I'm currently try to deploy a new ceph test cluster on centos7, (hammer)

from ceph-deploy (on a debian wheezy).

And it seem that systemd unit files are not deployed

Seem that ceph git have systemd unit file
https://github.com/ceph/ceph/tree/hammer/systemd

I don't have look inside the rpm package.


(This is my first install on centos, so I don't known if it's working with 
previous releases)


I have deployed with:

ceph-deploy install --release hammer ceph1-{1,2,3}
ceph-deploy new ceph1-{1,2,3} 


Is it normal ? 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy : systemd unit files not deployed to a centos7 nodes

2015-04-17 Thread Alexandre DERUMIER
Oh,

I didn't see that a sysvinit file was also deployed.

works fine with /etc/init.d/ceph  


- Mail original -
De: "aderumier" 
À: "ceph-users" 
Envoyé: Vendredi 17 Avril 2015 14:11:45
Objet: [ceph-users] ceph-deploy : systemd unit files not deployed to a  centos7 
nodes

Hi, 

I'm currently try to deploy a new ceph test cluster on centos7, (hammer) 

from ceph-deploy (on a debian wheezy). 

And it seem that systemd unit files are not deployed 

Seem that ceph git have systemd unit file 
https://github.com/ceph/ceph/tree/hammer/systemd 

I don't have look inside the rpm package. 


(This is my first install on centos, so I don't known if it's working with 
previous releases) 


I have deployed with: 

ceph-deploy install --release hammer ceph1-{1,2,3} 
ceph-deploy new ceph1-{1,2,3} 


Is it normal ? 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph repo - RSYNC?

2015-04-17 Thread Matt Taylor

Australian/Oceanic users can also rsync from here:

rsync://ceph.mirror.digitalpacific.com.au/ceph

As Wido mentioned before, you can also obtain packages from here too:

http://ceph.mirror.digitalpacific.com.au/

Mirror is located in Sydney, Australia and syncs directly from eu.ceph.com

Cheers,
Matt.

On 16/04/2015 22:26, Wido den Hollander wrote:



On 16-04-15 15:11, Paul Mansfield wrote:

On 16/04/15 09:55, Wido den Hollander wrote:

It's on my radar to come up with a proper mirror system for Ceph. A
simple Bash script which is in the Git repo which you can use to sync
all Ceph packages and downloads.


I've now set up a mirror of ceph/rpm-hammer/rhel7 for our internal use
and a simple snapshotting script copies the mirror to a date-stamped
directory using hard links so as not to eat up lots of disk space.



Yes, that works, but I also want to make sure all docs are copied.

Anyway, thanks for sharing!

Wido



the key bits of the script look somewhat like this (I'm copying/pasting
and editing without testing the results, and missing out various error
checks and information messages, so please don't just copy this into a
script blindly ;-)


#!/bin/bash

DDD=`date +%Y%m%d`

MIRRDIR=/fileserver/rhel/ceph
SNAPDIR=/fileserver/rhel/ceph-snapshots/ceph-$DDD
RSYNCSRC=rsync://eu.ceph.com/ceph


mkdir -p $SNAPDIR

# copy flags: a = archive, l = hard links, r = recursive,
# u = updated/newer files, v = verbose

# trailing slash style otherwise we end up with ceph-yymmdd/ceph/
nice cp -alruv $MIRRDIR/* $SNAPDIR/

if [ $? != 0] ; then
echo "error"
exit
fi


# add other versions here:
for SRC in rpm-hammer/rhel7
 rsync --bwlimit=1024 -aiH --no-perms --numeric-ids \
--delete --delete-after --delay-updates \
--exclude="*.i686.rpm" \
 $RSYNCSRC/$SRC/ $MIRRDIR/$SRC/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy : systemd unit files not deployed to a centos7 nodes

2015-04-17 Thread Ken Dreyer
As you've seen, a set of systemd unit files has been committed to git,
but the packages do not yet use them.

There is an open ticket for this task,
http://tracker.ceph.com/issues/11344 . Feel free to add yourself as a
watcher on that if you are interested in the progress.

- Ken

On 04/17/2015 06:22 AM, Alexandre DERUMIER wrote:
> Oh,
> 
> I didn't see that a sysvinit file was also deployed.
> 
> works fine with /etc/init.d/ceph  
> 
> 
> - Mail original -
> De: "aderumier" 
> À: "ceph-users" 
> Envoyé: Vendredi 17 Avril 2015 14:11:45
> Objet: [ceph-users] ceph-deploy : systemd unit files not deployed to a
> centos7 nodes
> 
> Hi, 
> 
> I'm currently try to deploy a new ceph test cluster on centos7, (hammer) 
> 
> from ceph-deploy (on a debian wheezy). 
> 
> And it seem that systemd unit files are not deployed 
> 
> Seem that ceph git have systemd unit file 
> https://github.com/ceph/ceph/tree/hammer/systemd 
> 
> I don't have look inside the rpm package. 
> 
> 
> (This is my first install on centos, so I don't known if it's working with 
> previous releases) 
> 
> 
> I have deployed with: 
> 
> ceph-deploy install --release hammer ceph1-{1,2,3} 
> ceph-deploy new ceph1-{1,2,3} 
> 
> 
> Is it normal ? 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] advantages of multiple pools?

2015-04-17 Thread Chad William Seys
Hi All,
   What are the advantages of having multiple ceph pools (if they use the 
whole cluster)?
   Thanks!

C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advantages of multiple pools?

2015-04-17 Thread Saverio Proto
For example you can assign different read/write permissions and
different keyrings to different pools.

2015-04-17 16:00 GMT+02:00 Chad William Seys :
> Hi All,
>What are the advantages of having multiple ceph pools (if they use the
> whole cluster)?
>Thanks!
>
> C.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advantages of multiple pools?

2015-04-17 Thread Lionel Bouton
On 04/17/15 16:01, Saverio Proto wrote:
> For example you can assign different read/write permissions and
> different keyrings to different pools.

>From memory you can set different replication settings, use a cache
pool or not, use specific crush map rules too.

Lionel Bouton
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph on Debian Jessie stopped working

2015-04-17 Thread Chad William Seys
Hi Greg,
   Thanks for the reply.  After looking more closely at /etc/ceph/rbdmap I 
discovered it was corrupted.  That was the only problem.

I think the dmesg line
'rbd: no image name provided'
is also a clue to this!

Hope that helps any other newbies!  :)

Thanks again,
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy : systemd unit files not deployed to a centos7 nodes

2015-04-17 Thread HEWLETT, Paul (Paul)** CTR **
I would be very keen for this to be implemented in Hammer and am willing to 
help test it...


Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1223 435893 m: +44 7985327353




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Ken Dreyer 
[kdre...@redhat.com]
Sent: 17 April 2015 14:45
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-deploy : systemd unit files not deployed to a
centos7 nodes

As you've seen, a set of systemd unit files has been committed to git,
but the packages do not yet use them.

There is an open ticket for this task,
http://tracker.ceph.com/issues/11344 . Feel free to add yourself as a
watcher on that if you are interested in the progress.

- Ken

On 04/17/2015 06:22 AM, Alexandre DERUMIER wrote:
> Oh,
>
> I didn't see that a sysvinit file was also deployed.
>
> works fine with /etc/init.d/ceph
>
>
> - Mail original -
> De: "aderumier" 
> À: "ceph-users" 
> Envoyé: Vendredi 17 Avril 2015 14:11:45
> Objet: [ceph-users] ceph-deploy : systemd unit files not deployed to a
> centos7 nodes
>
> Hi,
>
> I'm currently try to deploy a new ceph test cluster on centos7, (hammer)
>
> from ceph-deploy (on a debian wheezy).
>
> And it seem that systemd unit files are not deployed
>
> Seem that ceph git have systemd unit file
> https://github.com/ceph/ceph/tree/hammer/systemd
>
> I don't have look inside the rpm package.
>
>
> (This is my first install on centos, so I don't known if it's working with 
> previous releases)
>
>
> I have deployed with:
>
> ceph-deploy install --release hammer ceph1-{1,2,3}
> ceph-deploy new ceph1-{1,2,3}
>
>
> Is it normal ?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] full ssd setup preliminary hammer bench

2015-04-17 Thread Alexandre DERUMIER
Hi Mark,

I finally got my hardware for my production full ssd cluster.

Here a first preliminary bench. (1osd).

I got around 45K iops with randread 4K with a small 10GB rbd volume


I'm pretty happy because I don't see anymore huge cpu difference between krbd 
&& lirbd.
In my previous bench I was using debian wheezy as client,
now it's a centos 7.1, so maybe something is different (glibc,...).

I'm planning to do big benchmark centos vs ubuntu vs debian, client && server, 
to compare.
I have 18 osd ssd for the benchmarks.







results : rand 4K : 1 osd
-

fio + librbd: 

iops: 45.1K

clat percentiles (usec):
 |  1.00th=[  358],  5.00th=[  406], 10.00th=[  446], 20.00th=[  556],
 | 30.00th=[  676], 40.00th=[ 1048], 50.00th=[ 1192], 60.00th=[ 1304],
 | 70.00th=[ 1400], 80.00th=[ 1496], 90.00th=[ 1624], 95.00th=[ 1720],
 | 99.00th=[ 1880], 99.50th=[ 1928], 99.90th=[ 2064], 99.95th=[ 2128],
 | 99.99th=[ 2512]

cpu server :  89.1 iddle
cpu client :  92,5 idle

fio + krbd
--
iops:47.5K

clat percentiles (usec):
 |  1.00th=[  620],  5.00th=[  636], 10.00th=[  644], 20.00th=[  652],
 | 30.00th=[  668], 40.00th=[  676], 50.00th=[  684], 60.00th=[  692],
 | 70.00th=[  708], 80.00th=[  724], 90.00th=[  756], 95.00th=[  820],
 | 99.00th=[ 1004], 99.50th=[ 1032], 99.90th=[ 1144], 99.95th=[ 1448],
 | 99.99th=[ 2224]

cpu server :  92.4 idle
cpu client :  96,8 idle




hardware (ceph node && client node):
---
ceph : hammer
os : centos 7.1
2 x 10cores Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz
64GB ram
2 x intel s3700 100GB : raid1: os + monitor
6 x intel s3500 160GB : osds
2x10gb mellanox connect-x3 (lacp)

network
---
mellanox sx1012 with breakout cables (10GB)


centos tunning:
---
-noop scheduler
-tune-adm profile latency-performance

ceph.conf
-
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true


osd pool default min size = 1

debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

osd_op_threads = 5
filestore_op_threads = 4


osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
ms_nocrc = true
ms_dispatch_throttle_bytes = 0

cephx sign messages = false
cephx require signatures = false

[client]
rbd_cache = false





rand 4K : rbd volume size: 10GB  (data in osd node buffer - no access to disk)
--
fio + librbd

[global]
ioengine=rbd
clientname=admin
pool=pooltest
rbdname=rbdtest
invalidate=0  
rw=randread
direct=1
bs=4k
numjobs=2
group_reporting=1
iodepth=32



fio + krbd
---
[global]
ioengine=aio
invalidate=1# mandatory
rw=randread
bs=4K
direct=1
numjobs=2
group_reporting=1
size=10G

iodepth=32
filename=/dev/rbd0   (noop scheduler)






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph.com

2015-04-17 Thread Paul Mansfield
On 16/04/15 17:34, Chris Armstrong wrote:
> Thanks for the update, Patrick. Our Docker builds were failing due to
> the mirror being down. I appreciate being able to check the mailing list
> and quickly see what's going on!

if you're accessing the ceph repo all the time, it's probably worth the
effort of setting up your own mirror, you can cargo-cult the snippet of
script I posted here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000467.html

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on Solaris / Illumos

2015-04-17 Thread Michal Kozanecki
Performance on ZFS on Linux (ZoL) seems to be fine, as long as you use the CEPH 
generic filesystem implementation (writeahead) and not the specific CEPH ZFS 
implementation, CoW snapshoting that CEPH does with ZFS support compiled in 
absolutely kills performance. I suspect the same would go with CEPH on Illumos 
on ZFS. Otherwise it is comparable to XFS in my own testing once tweaked. 

There are a few oddities/quirks with ZFS performance that need to be tweaked 
when using it with CEPH, and yea enabling SA on xattr is one of them.

1. ZFS recordsize - The ZFS "sector size", known as within ZFS as the 
recordsize is technically dynamic. It only enforces the maximum size, however 
the way CEPH writes and reads from objects (when working with smaller blocks, 
let's say 4k or 8k via rbd) with default settings seems to be affected by the 
recordsize. With the default 128K I've found lower IOPS and higher latency. 
Setting the recordsize too low will inflate various ZFS metadata, so it needs 
to be balanced against how your CEPH pool will be used. 

For rbd pools(where small block performance may be important) a recordsize of 
32K seems to be a good balance. For pure large object based use (rados, etc) 
the 128K default is fine, throughput is high(small block performance isn't 
important here). See following links for more info about recordsize: 
https://blogs.oracle.com/roch/entry/tuning_zfs_recordsize and 
https://www.joyent.com/blog/bruning-questions-zfs-record-size

2. XATTR - I didn't do much testing here, I've read that if you do not set 
xattr = sa on ZFS you will get poor performance. There were also stability 
issues in the past with xattr = sa on ZFS though it seems all resolved now and 
I have not encountered any issues myself. I'm unsure what the default setting 
is here, I always enable it.

Make sure you enable and set xattr = sa on ZFS.

3. ZIL(ZFS Intent Log, also known as the slog) is a MUST (even with a separate 
ceph journal) - It appears that while the ceph journal offloads/absorbs writes 
nicely and boosts performance, it does not consolidate writes enough for ZFS. 
Without a ZIL/SLOG your performance will be very sawtooth like (jumpy, stutter, 
aka fast then slow, fast than slow over a period of 10-15 seconds). 

In theory tweaking the various ZFS TXG sync settings might work, but it is 
overly complicated to maintain and likely would only apply to the specific 
underlying disk model. Disabling sync also resolves this, though you'll lose 
the last TXG on a power failure - this might be okay with CEPH, but since I'm 
unsure I'll just assume it is not. IMHO avoid too much evil tuning, just add a 
ZIL/SLOG.   

4. ZIL/SLOG + on-device ceph journal vs ZIL/SLOG + separate ceph journal - 
Performance is very similar, if you have a ZIL/SLOG you could easily get away 
without a separate ceph journal and leave it on the device/ZFS dataset. HOWEVER 
this causes HUGE amounts of fragmentation due to the CoW nature. After only a 
few days usage, performance tanked with the ceph journal on the same device. 

I did find that if you partition and share device/SSD between both ZIL/SLOG and 
a separate ceph journal, the resulting performance is about the same in pure 
throughput/iops, though latency is slightly higher. This is what I do in my 
test cluster.

5. Fragmentation - once you hit around 80-90% disk usage your performance will 
start to slow down due to fragmentation. This isn't due to CEPH, it’s a known 
ZFS quirk due to its CoW nature. Unfortunately there is no defrag in ZFS, and 
likely never will be (the mythical block point rewrite unicorn you'll find 
people talking about). 

There is one way to delay it and possibly avoid it however, enable 
metaslab_debug, this will put the ZFS spacemaps in memory, allowing ZFS to make 
better placements during CoW operations, but it does use more memory. See the 
following links for more detail about spacemaps and fragmentation: 
http://blog.delphix.com/uday/2013/02/19/78/ and http://serverfault.com/a/556892 
and http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45408.html 

There's alot more to ZFS and "things-to-know" than that (L2ARC uses ARC 
metadata space, dedupe uses ARC metadata space, etc), but as far as CEPH is 
cocearned the above is a good place to start. ZFS IMHO is a great solution, but 
it requires some time and effort to do it right.

Cheers,

Michal Kozanecki | Linux Administrator | E: mkozane...@evertz.com


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: April-15-15 12:22 PM
To: Jake Young
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph on Solaris / Illumos

On 04/15/2015 10:36 AM, Jake Young wrote:
>
>
> On Wednesday, April 15, 2015, Mark Nelson  > wrote:
>
>
>
> On 04/15/2015 08:16 AM, Jake Young wrote:
>
> Has anyone compiled ceph (either osd or client) on a Solaris
> based OS?
>
> The t

Re: [ceph-users] full ssd setup preliminary hammer bench

2015-04-17 Thread Michal Kozanecki
Any quick write performance data?

Michal Kozanecki | Linux Administrator | E: mkozane...@evertz.com


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER
Sent: April-17-15 11:38 AM
To: Mark Nelson; ceph-users
Subject: [ceph-users] full ssd setup preliminary hammer bench

Hi Mark,

I finally got my hardware for my production full ssd cluster.

Here a first preliminary bench. (1osd).

I got around 45K iops with randread 4K with a small 10GB rbd volume


I'm pretty happy because I don't see anymore huge cpu difference between krbd 
&& lirbd.
In my previous bench I was using debian wheezy as client, now it's a centos 
7.1, so maybe something is different (glibc,...).

I'm planning to do big benchmark centos vs ubuntu vs debian, client && server, 
to compare.
I have 18 osd ssd for the benchmarks.







results : rand 4K : 1 osd
-

fio + librbd: 

iops: 45.1K

clat percentiles (usec):
 |  1.00th=[  358],  5.00th=[  406], 10.00th=[  446], 20.00th=[  556],
 | 30.00th=[  676], 40.00th=[ 1048], 50.00th=[ 1192], 60.00th=[ 1304],
 | 70.00th=[ 1400], 80.00th=[ 1496], 90.00th=[ 1624], 95.00th=[ 1720],
 | 99.00th=[ 1880], 99.50th=[ 1928], 99.90th=[ 2064], 99.95th=[ 2128],
 | 99.99th=[ 2512]

cpu server :  89.1 iddle
cpu client :  92,5 idle

fio + krbd
--
iops:47.5K

clat percentiles (usec):
 |  1.00th=[  620],  5.00th=[  636], 10.00th=[  644], 20.00th=[  652],
 | 30.00th=[  668], 40.00th=[  676], 50.00th=[  684], 60.00th=[  692],
 | 70.00th=[  708], 80.00th=[  724], 90.00th=[  756], 95.00th=[  820],
 | 99.00th=[ 1004], 99.50th=[ 1032], 99.90th=[ 1144], 99.95th=[ 1448],
 | 99.99th=[ 2224]

cpu server :  92.4 idle
cpu client :  96,8 idle




hardware (ceph node && client node):
---
ceph : hammer
os : centos 7.1
2 x 10cores Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 64GB ram
2 x intel s3700 100GB : raid1: os + monitor
6 x intel s3500 160GB : osds
2x10gb mellanox connect-x3 (lacp)

network
---
mellanox sx1012 with breakout cables (10GB)


centos tunning:
---
-noop scheduler
-tune-adm profile latency-performance

ceph.conf
-
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true


osd pool default min size = 1

debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

osd_op_threads = 5
filestore_op_threads = 4


osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
ms_nocrc = true
ms_dispatch_throttle_bytes = 0

cephx sign messages = false
cephx require signatures = false

[client]
rbd_cache = false





rand 4K : rbd volume size: 10GB  (data in osd node buffer - no access to disk)
--
fio + librbd

[global]
ioengine=rbd
clientname=admin
pool=pooltest
rbdname=rbdtest
invalidate=0
rw=randread
direct=1
bs=4k
numjobs=2
group_reporting=1
iodepth=32



fio + krbd
---
[global]
ioengine=aio
invalidate=1# mandatory
rw=randread
bs=4K
direct=1
numjobs=2
group_reporting=1
size=10G

iodepth=32
filename=/dev/rbd0   (noop scheduler)






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Robert LeBlanc
Delete and re-add all six OSDs.

On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic 
wrote:

> Hi guys,
>
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
> ceph rebalanced etc.
>
> Now I have new SSD inside, and I will partition it etc - but would like to
> know, how to proceed now, with the journal recreation for those 6 OSDs that
> are down now.
>
> Should I flush journal (where to, journals doesnt still exist...?), or
> just recreate journal from scratch (making symboliv links again: ln -s
> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
>
> I expect the folowing procedure, but would like confirmation please:
>
> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
> ceph-osd -i $ID --mkjournal
> ll /var/lib/ceph/osd/ceph-$ID/journal
> service ceph start osd.$ID
>
> Any thought greatly appreciated !
>
> Thanks,
>
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on Solaris / Illumos

2015-04-17 Thread Jake Young
On Friday, April 17, 2015, Michal Kozanecki  wrote:

> Performance on ZFS on Linux (ZoL) seems to be fine, as long as you use the
> CEPH generic filesystem implementation (writeahead) and not the specific
> CEPH ZFS implementation, CoW snapshoting that CEPH does with ZFS support
> compiled in absolutely kills performance. I suspect the same would go with
> CEPH on Illumos on ZFS. Otherwise it is comparable to XFS in my own testing
> once tweaked.
>
> There are a few oddities/quirks with ZFS performance that need to be
> tweaked when using it with CEPH, and yea enabling SA on xattr is one of
> them.
>
> 1. ZFS recordsize - The ZFS "sector size", known as within ZFS as the
> recordsize is technically dynamic. It only enforces the maximum size,
> however the way CEPH writes and reads from objects (when working with
> smaller blocks, let's say 4k or 8k via rbd) with default settings seems to
> be affected by the recordsize. With the default 128K I've found lower IOPS
> and higher latency. Setting the recordsize too low will inflate various ZFS
> metadata, so it needs to be balanced against how your CEPH pool will be
> used.
>
> For rbd pools(where small block performance may be important) a recordsize
> of 32K seems to be a good balance. For pure large object based use (rados,
> etc) the 128K default is fine, throughput is high(small block performance
> isn't important here). See following links for more info about recordsize:
> https://blogs.oracle.com/roch/entry/tuning_zfs_recordsize and
> https://www.joyent.com/blog/bruning-questions-zfs-record-size
>
> 2. XATTR - I didn't do much testing here, I've read that if you do not set
> xattr = sa on ZFS you will get poor performance. There were also stability
> issues in the past with xattr = sa on ZFS though it seems all resolved now
> and I have not encountered any issues myself. I'm unsure what the default
> setting is here, I always enable it.
>
> Make sure you enable and set xattr = sa on ZFS.
>
> 3. ZIL(ZFS Intent Log, also known as the slog) is a MUST (even with a
> separate ceph journal) - It appears that while the ceph journal
> offloads/absorbs writes nicely and boosts performance, it does not
> consolidate writes enough for ZFS. Without a ZIL/SLOG your performance will
> be very sawtooth like (jumpy, stutter, aka fast then slow, fast than slow
> over a period of 10-15 seconds).
>
> In theory tweaking the various ZFS TXG sync settings might work, but it is
> overly complicated to maintain and likely would only apply to the specific
> underlying disk model. Disabling sync also resolves this, though you'll
> lose the last TXG on a power failure - this might be okay with CEPH, but
> since I'm unsure I'll just assume it is not. IMHO avoid too much evil
> tuning, just add a ZIL/SLOG.
>
> 4. ZIL/SLOG + on-device ceph journal vs ZIL/SLOG + separate ceph journal -
> Performance is very similar, if you have a ZIL/SLOG you could easily get
> away without a separate ceph journal and leave it on the device/ZFS
> dataset. HOWEVER this causes HUGE amounts of fragmentation due to the CoW
> nature. After only a few days usage, performance tanked with the ceph
> journal on the same device.
>
> I did find that if you partition and share device/SSD between both
> ZIL/SLOG and a separate ceph journal, the resulting performance is about
> the same in pure throughput/iops, though latency is slightly higher. This
> is what I do in my test cluster.
>
> 5. Fragmentation - once you hit around 80-90% disk usage your performance
> will start to slow down due to fragmentation. This isn't due to CEPH, it’s
> a known ZFS quirk due to its CoW nature. Unfortunately there is no defrag
> in ZFS, and likely never will be (the mythical block point rewrite unicorn
> you'll find people talking about).
>
> There is one way to delay it and possibly avoid it however, enable
> metaslab_debug, this will put the ZFS spacemaps in memory, allowing ZFS to
> make better placements during CoW operations, but it does use more memory.
> See the following links for more detail about spacemaps and fragmentation:
> http://blog.delphix.com/uday/2013/02/19/78/ and
> http://serverfault.com/a/556892 and
> http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45408.html
>
> There's alot more to ZFS and "things-to-know" than that (L2ARC uses ARC
> metadata space, dedupe uses ARC metadata space, etc), but as far as CEPH is
> cocearned the above is a good place to start. ZFS IMHO is a great solution,
> but it requires some time and effort to do it right.
>
> Cheers,
>
> Michal Kozanecki | Linux Administrator | E: mkozane...@evertz.com
> 
>
>
Thank you for taking the time to share that, Michal!

Jake


>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com ]
> On Behalf Of Mark Nelson
> Sent: April-15-15 12:22 PM
> To: Jake Young
> Cc: ceph-users@lists.ceph.com 
> Subject: Re: [ceph-users] Ceph on Solaris / Illumos
>
> On 04/15/2015 10:36 AM, Jake Young wrote:
> >
> >
> > On 

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Thx guys, thats what I will be doing at the end.

Cheers
On Apr 17, 2015 6:24 PM, "Robert LeBlanc"  wrote:

> Delete and re-add all six OSDs.
>
> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic 
> wrote:
>
>> Hi guys,
>>
>> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
>> ceph rebalanced etc.
>>
>> Now I have new SSD inside, and I will partition it etc - but would like
>> to know, how to proceed now, with the journal recreation for those 6 OSDs
>> that are down now.
>>
>> Should I flush journal (where to, journals doesnt still exist...?), or
>> just recreate journal from scratch (making symboliv links again: ln -s
>> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
>>
>> I expect the folowing procedure, but would like confirmation please:
>>
>> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
>> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
>> ceph-osd -i $ID --mkjournal
>> ll /var/lib/ceph/osd/ceph-$ID/journal
>> service ceph start osd.$ID
>>
>> Any thought greatly appreciated !
>>
>> Thanks,
>>
>> --
>>
>> Andrija Panić
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
existing OSD UUID, copy the keyring and let it populate itself?

pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic 
napisał:

> Thx guys, thats what I will be doing at the end.
>
> Cheers
> On Apr 17, 2015 6:24 PM, "Robert LeBlanc"  wrote:
>
>> Delete and re-add all six OSDs.
>>
>> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
>>> ceph rebalanced etc.
>>>
>>> Now I have new SSD inside, and I will partition it etc - but would like
>>> to know, how to proceed now, with the journal recreation for those 6 OSDs
>>> that are down now.
>>>
>>> Should I flush journal (where to, journals doesnt still exist...?), or
>>> just recreate journal from scratch (making symboliv links again: ln -s
>>> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
>>>
>>> I expect the folowing procedure, but would like confirmation please:
>>>
>>> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
>>> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
>>> ceph-osd -i $ID --mkjournal
>>> ll /var/lib/ceph/osd/ceph-$ID/journal
>>> service ceph start osd.$ID
>>>
>>> Any thought greatly appreciated !
>>>
>>> Thanks,
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
12 osds down - I expect less work with removing and adding osd?
On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" 
wrote:

> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
> existing OSD UUID, copy the keyring and let it populate itself?
>
> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic 
> napisał:
>
>> Thx guys, thats what I will be doing at the end.
>>
>> Cheers
>> On Apr 17, 2015 6:24 PM, "Robert LeBlanc"  wrote:
>>
>>> Delete and re-add all six OSDs.
>>>
>>> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic 
>>> wrote:
>>>
 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
 ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would like
 to know, how to proceed now, with the journal recreation for those 6 OSDs
 that are down now.

 Should I flush journal (where to, journals doesnt still exist...?), or
 just recreate journal from scratch (making symboliv links again: ln -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>  ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
Hi,

Did 6 other OSDs go down when re-adding?

/Josef

> On 17 Apr 2015, at 18:49, Andrija Panic  wrote:
> 
> 12 osds down - I expect less work with removing and adding osd?
> 
> On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki"  > wrote:
> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the 
> existing OSD UUID, copy the keyring and let it populate itself?
> 
> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic  > napisał:
> Thx guys, thats what I will be doing at the end.
> 
> Cheers
> 
> On Apr 17, 2015 6:24 PM, "Robert LeBlanc"  > wrote:
> Delete and re-add all six OSDs.
> 
> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic  > wrote:
> Hi guys,
> 
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph 
> rebalanced etc.
> 
> Now I have new SSD inside, and I will partition it etc - but would like to 
> know, how to proceed now, with the journal recreation for those 6 OSDs that 
> are down now.
> 
> Should I flush journal (where to, journals doesnt still exist...?), or just 
> recreate journal from scratch (making symboliv links again: ln -s 
> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
> 
> I expect the folowing procedure, but would like confirmation please:
> 
> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
> ceph-osd -i $ID --mkjournal
> ll /var/lib/ceph/osd/ceph-$ID/journal
> service ceph start osd.$ID
> 
> Any thought greatly appreciated !
> 
> Thanks,
> 
> -- 
> 
> Andrija Panić
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] many slow requests on different osds (scrubbing disabled)

2015-04-17 Thread Craig Lewis
I've seen something like this a few times.

Once, I lost the battery in my battery backed RAID card.  That caused all
the OSDs on that host to be slow, which triggered slow request notices
pretty much cluster wide.  It was only when I histogrammed the slow request
notices that I saw most of them were on a single node.  I compared the disk
latency graphs between nodes, and saw that one node had a much higher write
latency. This took me a while to track down.

Another time, I had a consume HDD that was slowly failing.  It would hit a
group of bad sector, remap, repeat.  SMART warned me about it, so I
replaced the disk after the second slow request alerts.  This was pretty
straight forward to diagnose, only because smartd notified me.


I both cases, I saw "slow request" notices on the affect disks.  Your
osd.284 says osd.186 and osd.177 are being slow, but osd.186 and osd.177
don't claim to be slow.

It's possible that their is another disk that is slow, causing osd.186 and
osd.177 replication to slow down.  With the PG distribution over OSDs, one
disk being a little slow can affect a large number of OSDs.


If SMART doesn't show you a disk is failing, I'd start looking for disks
(the disk itself, not the OSD daemon) with a high latency around your
problem times.  If you focus on the problem times, give it a +/- 10 minutes
window.  Sometimes it takes a little while for the disk slowness to spread
out enough for Ceph to complain.


On Wed, Apr 15, 2015 at 3:20 PM, Dominik Mostowiec <
dominikmostow...@gmail.com> wrote:

> Hi,
> From few days we notice on our cluster many slow request.
> Cluster:
> ceph version 0.67.11
> 3 x mon
> 36 hosts -> 10 osd ( 4T ) + 2 SSD (journals)
> Scrubbing and deep scrubbing is disabled but count of slow requests is
> still increasing.
> Disk utilisation is very small after we have disabled scrubbings.
> Log from one write with slow with debug osd = 20/20
> osd.284 - master: http://pastebin.com/xPtpNU6n
> osd.186 - replica: http://pastebin.com/NS1gmhB0
> osd.177 - replica: http://pastebin.com/Ln9L2Z5Z
>
> Can you help me find what is reason of it?
>
> --
> Regards
> Dominik
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
down, and rebalancing is about finish... after which I need to fix the OSDs.

On 17 April 2015 at 19:01, Josef Johansson  wrote:

> Hi,
>
> Did 6 other OSDs go down when re-adding?
>
> /Josef
>
> On 17 Apr 2015, at 18:49, Andrija Panic  wrote:
>
> 12 osds down - I expect less work with removing and adding osd?
> On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
> krzysztof.a.nowi...@gmail.com> wrote:
>
>> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
>> existing OSD UUID, copy the keyring and let it populate itself?
>>
>> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
>> andrija.pa...@gmail.com> napisał:
>>
>>> Thx guys, thats what I will be doing at the end.
>>>
>>> Cheers
>>> On Apr 17, 2015 6:24 PM, "Robert LeBlanc"  wrote:
>>>
 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic >>> > wrote:

> Hi guys,
>
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
> down, ceph rebalanced etc.
>
> Now I have new SSD inside, and I will partition it etc - but would
> like to know, how to proceed now, with the journal recreation for those 6
> OSDs that are down now.
>
> Should I flush journal (where to, journals doesnt still exist...?), or
> just recreate journal from scratch (making symboliv links again: ln -s
> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
>
> I expect the folowing procedure, but would like confirmation please:
>
> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
> ceph-osd -i $ID --mkjournal
> ll /var/lib/ceph/osd/ceph-$ID/journal
> service ceph start osd.$ID
>
> Any thought greatly appreciated !
>
> Thanks,
>
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
  ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Query regarding integrating Ceph with Vcenter/Clustered Esxi hosts.

2015-04-17 Thread Vivek Varghese Cherian
Hi all,

I have a setup where I can launch vms from a standalone vmware esxi host
which acts as a iscsi initiator and a ceph rbd block device that is
exported as a iscsi target.

During the time of launching of vms from the standalone esxi host
integrated with ceph, it is prompting me to choose the datastore I want to
launch the vms on, and all I need to do is to choose the iscsi datastore on
the ceph cluster and the vms gets launched there.

My client's requirement is not to just to launch vms on a standalone vmware
esxi host integrated with Ceph, but that he should be able to have a shared
VirtualSAN storage kind of environment like the one on NetApp and should be
able to launch vms on ceph storage from either clustered Esxi hosts or from
all hosts across a VCenter.

Is this feasible ?

Any pointers/references would be most welcome.

Thanks in advance.

Regards,
-- 
Vivek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Managing larger ceph clusters

2015-04-17 Thread Craig Lewis
I'm running a small cluster, but I'll chime in since nobody else has.

Cern had a presentation a while ago (dumpling time-frame) about their
deployment.  They go over some of your questions:
http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern

My philosophy on Config Management is that it should save me time.  If it's
going to take me longer to write a recipe to do something, I'll just do it
by hand. Since my cluster is small, there are many things I can do faster
by hand.  This may or may not work for you, depending on your documentation
/ repeatability requirements.  For things that need to be documented, I'll
usually write the recipe anyway (I accept Chef recipes as documentation).


For my clusters, I'm using Chef to setups all nodes and manage ceph.conf.
I manually manage my pools, CRUSH map, RadosGW users, and disk
replacement.  I was using Chef to add new disks, but I ran into load
problems due to my small cluster size.  I'm currently adding disks
manually, to manage cluster load better.  As my cluster gets larger,
that'll be less important.

I'm also doing upgrades manually, because it's less work than writing the
Chef recipe to do a cluster upgrade.  Since Chef isn't cluster aware, it
would be a a pain to make the recipe cluster aware enough to handle the
upgrade.  And I figure if I stall long enough, somebody else will write it
:-)  Ansible, with it's cluster wide coordination, looks like it would
handle that a bit better.



On Wed, Apr 15, 2015 at 2:05 PM, Stillwell, Bryan <
bryan.stillw...@twcable.com> wrote:

> I'm curious what people managing larger ceph clusters are doing with
> configuration management and orchestration to simplify their lives?
>
> We've been using ceph-deploy to manage our ceph clusters so far, but
> feel that moving the management of our clusters to standard tools would
> provide a little more consistency and help prevent some mistakes that
> have happened while using ceph-deploy.
>
> We're looking at using the same tools we use in our OpenStack
> environment (puppet/ansible), but I'm interested in hearing from people
> using chef/salt/juju as well.
>
> Some of the cluster operation tasks that I can think of along with
> ideas/concerns I have are:
>
> Keyring management
>   Seems like hiera-eyaml is a natural fit for storing the keyrings.
>
> ceph.conf
>   I believe the puppet ceph module can be used to manage this file, but
>   I'm wondering if using a template (erb?) might be better method to
>   keeping it organized and properly documented.
>
> Pool configuration
>   The puppet module seems to be able to handle managing replicas and the
>   number of placement groups, but I don't see support for erasure coded
>   pools yet.  This is probably something we would want the initial
>   configuration to be set up by puppet, but not something we would want
>   puppet changing on a production cluster.
>
> CRUSH maps
>   Describing the infrastructure in yaml makes sense.  Things like which
>   servers are in which rows/racks/chassis.  Also describing the type of
>   server (model, number of HDDs, number of SSDs) makes sense.
>
> CRUSH rules
>   I could see puppet managing the various rules based on the backend
>   storage (HDD, SSD, primary affinity, erasure coding, etc).
>
> Replacing a failed HDD disk
>   Do you automatically identify the new drive and start using it right
>   away?  I've seen people talk about using a combination of udev and
>   special GPT partition IDs to automate this.  If you have a cluster
>   with thousands of drives I think automating the replacement makes
>   sense.  How do you handle the journal partition on the SSD?  Does
>   removing the old journal partition and creating a new one create a
>   hole in the partition map (because the old partition is removed and
>   the new one is created at the end of the drive)?
>
> Replacing a failed SSD journal
>   Has anyone automated recreating the journal drive using Sebastien
>   Han's instructions, or do you have to rebuild all the OSDs as well?
>
>
> http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-jou
> rnal-failure/
>
> Adding new OSD servers
>   How are you adding multiple new OSD servers to the cluster?  I could
>   see an ansible playbook which disables nobackfill, noscrub, and
>   nodeep-scrub followed by adding all the OSDs to the cluster being
>   useful.
>
> Upgrading releases
>   I've found an ansible playbook for doing a rolling upgrade which looks
>   like it would work well, but are there other methods people are using?
>
>
> http://www.sebastien-han.fr/blog/2015/03/30/ceph-rolling-upgrades-with-ansi
> ble/
>
> Decommissioning hardware
>   Seems like another ansible playbook for reducing the OSDs weights to
>   zero, marking the OSDs out, stopping the service, removing the OSD ID,
>   removing the CRUSH entry, unmounting the drives, and finally removing
>   the server would be the best method here.  Any other ideas on how to
>   approach this?
>
>
> That's all 

Re: [ceph-users] full ssd setup preliminary hammer bench

2015-04-17 Thread Stefan Priebe


Am 17.04.2015 um 17:37 schrieb Alexandre DERUMIER:

Hi Mark,

I finally got my hardware for my production full ssd cluster.

Here a first preliminary bench. (1osd).

I got around 45K iops with randread 4K with a small 10GB rbd volume


I'm pretty happy because I don't see anymore huge cpu difference between krbd 
&& lirbd.
In my previous bench I was using debian wheezy as client,
now it's a centos 7.1, so maybe something is different (glibc,...).


any idea whether this might be the tcmalloc bug?



I'm planning to do big benchmark centos vs ubuntu vs debian, client && server, 
to compare.
I have 18 osd ssd for the benchmarks.







results : rand 4K : 1 osd
-

fio + librbd:

iops: 45.1K

 clat percentiles (usec):
  |  1.00th=[  358],  5.00th=[  406], 10.00th=[  446], 20.00th=[  556],
  | 30.00th=[  676], 40.00th=[ 1048], 50.00th=[ 1192], 60.00th=[ 1304],
  | 70.00th=[ 1400], 80.00th=[ 1496], 90.00th=[ 1624], 95.00th=[ 1720],
  | 99.00th=[ 1880], 99.50th=[ 1928], 99.90th=[ 2064], 99.95th=[ 2128],
  | 99.99th=[ 2512]

cpu server :  89.1 iddle
cpu client :  92,5 idle

fio + krbd
--
iops:47.5K

 clat percentiles (usec):
  |  1.00th=[  620],  5.00th=[  636], 10.00th=[  644], 20.00th=[  652],
  | 30.00th=[  668], 40.00th=[  676], 50.00th=[  684], 60.00th=[  692],
  | 70.00th=[  708], 80.00th=[  724], 90.00th=[  756], 95.00th=[  820],
  | 99.00th=[ 1004], 99.50th=[ 1032], 99.90th=[ 1144], 99.95th=[ 1448],
  | 99.99th=[ 2224]

cpu server :  92.4 idle
cpu client :  96,8 idle




hardware (ceph node && client node):
---
ceph : hammer
os : centos 7.1
2 x 10cores Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz
64GB ram
2 x intel s3700 100GB : raid1: os + monitor
6 x intel s3500 160GB : osds
2x10gb mellanox connect-x3 (lacp)

network
---
mellanox sx1012 with breakout cables (10GB)


centos tunning:
---
-noop scheduler
-tune-adm profile latency-performance

ceph.conf
-
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true


 osd pool default min size = 1

 debug lockdep = 0/0
 debug context = 0/0
 debug crush = 0/0
 debug buffer = 0/0
 debug timer = 0/0
 debug journaler = 0/0
 debug osd = 0/0
 debug optracker = 0/0
 debug objclass = 0/0
 debug filestore = 0/0
 debug journal = 0/0
 debug ms = 0/0
 debug monc = 0/0
 debug tp = 0/0
 debug auth = 0/0
 debug finisher = 0/0
 debug heartbeatmap = 0/0
 debug perfcounter = 0/0
 debug asok = 0/0
 debug throttle = 0/0

 osd_op_threads = 5
 filestore_op_threads = 4


 osd_op_num_threads_per_shard = 1
 osd_op_num_shards = 10
 filestore_fd_cache_size = 64
 filestore_fd_cache_shards = 32
 ms_nocrc = true
 ms_dispatch_throttle_bytes = 0

 cephx sign messages = false
 cephx require signatures = false

[client]
rbd_cache = false





rand 4K : rbd volume size: 10GB  (data in osd node buffer - no access to disk)
--
fio + librbd

[global]
ioengine=rbd
clientname=admin
pool=pooltest
rbdname=rbdtest
invalidate=0
rw=randread
direct=1
bs=4k
numjobs=2
group_reporting=1
iodepth=32



fio + krbd
---
[global]
ioengine=aio
invalidate=1# mandatory
rw=randread
bs=4K
direct=1
numjobs=2
group_reporting=1
size=10G

iodepth=32
filename=/dev/rbd0   (noop scheduler)






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
tough luck, hope everything comes up ok afterwards. What models on the SSD?

/Josef
On 17 Apr 2015 20:05, "Andrija Panic"  wrote:

> SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
> down, and rebalancing is about finish... after which I need to fix the OSDs.
>
> On 17 April 2015 at 19:01, Josef Johansson  wrote:
>
>> Hi,
>>
>> Did 6 other OSDs go down when re-adding?
>>
>> /Josef
>>
>> On 17 Apr 2015, at 18:49, Andrija Panic  wrote:
>>
>> 12 osds down - I expect less work with removing and adding osd?
>> On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
>> krzysztof.a.nowi...@gmail.com> wrote:
>>
>>> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
>>> existing OSD UUID, copy the keyring and let it populate itself?
>>>
>>> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
>>> andrija.pa...@gmail.com> napisał:
>>>
 Thx guys, thats what I will be doing at the end.

 Cheers
 On Apr 17, 2015 6:24 PM, "Robert LeBlanc"  wrote:

> Delete and re-add all six OSDs.
>
> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic <
> andrija.pa...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
>> down, ceph rebalanced etc.
>>
>> Now I have new SSD inside, and I will partition it etc - but would
>> like to know, how to proceed now, with the journal recreation for those 6
>> OSDs that are down now.
>>
>> Should I flush journal (where to, journals doesnt still exist...?),
>> or just recreate journal from scratch (making symboliv links again: ln -s
>> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
>>
>> I expect the folowing procedure, but would like confirmation please:
>>
>> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
>> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
>> ceph-osd -i $ID --mkjournal
>> ll /var/lib/ceph/osd/ceph-$ID/journal
>> service ceph start osd.$ID
>>
>> Any thought greatly appreciated !
>>
>> Thanks,
>>
>> --
>>
>> Andrija Panić
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>  ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
wearing level is 96%, so only 4% wasted... (yes I know these are not
enterprise,etc... )

On 17 April 2015 at 21:01, Josef Johansson  wrote:

> tough luck, hope everything comes up ok afterwards. What models on the SSD?
>
> /Josef
> On 17 Apr 2015 20:05, "Andrija Panic"  wrote:
>
>> SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
>> down, and rebalancing is about finish... after which I need to fix the OSDs.
>>
>> On 17 April 2015 at 19:01, Josef Johansson  wrote:
>>
>>> Hi,
>>>
>>> Did 6 other OSDs go down when re-adding?
>>>
>>> /Josef
>>>
>>> On 17 Apr 2015, at 18:49, Andrija Panic  wrote:
>>>
>>> 12 osds down - I expect less work with removing and adding osd?
>>> On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
>>> krzysztof.a.nowi...@gmail.com> wrote:
>>>
 Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
 existing OSD UUID, copy the keyring and let it populate itself?

 pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
 andrija.pa...@gmail.com> napisał:

> Thx guys, thats what I will be doing at the end.
>
> Cheers
> On Apr 17, 2015 6:24 PM, "Robert LeBlanc" 
> wrote:
>
>> Delete and re-add all six OSDs.
>>
>> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic <
>> andrija.pa...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
>>> down, ceph rebalanced etc.
>>>
>>> Now I have new SSD inside, and I will partition it etc - but would
>>> like to know, how to proceed now, with the journal recreation for those 
>>> 6
>>> OSDs that are down now.
>>>
>>> Should I flush journal (where to, journals doesnt still exist...?),
>>> or just recreate journal from scratch (making symboliv links again: ln 
>>> -s
>>> /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.
>>>
>>> I expect the folowing procedure, but would like confirmation please:
>>>
>>> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
>>> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
>>> ceph-osd -i $ID --mkjournal
>>> ll /var/lib/ceph/osd/ceph-$ID/journal
>>> service ceph start osd.$ID
>>>
>>> Any thought greatly appreciated !
>>>
>>> Thanks,
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
  ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Managing larger ceph clusters

2015-04-17 Thread Steve Anthony
For reference, I'm currently running 26 nodes (338 OSDs); will be 35
nodes (455 OSDs) in the near future.

Node/OSD provisioning and replacements:

Mostly I'm using ceph-deploy, at least to do node/osd adds and
replacements. Right now the process is:

Use FAI (http://fai-project.org) to setup software RAID1/LVM for the OS
disks, and do a minimal installation, including the salt-minion.

Accept the new minion on the salt-master node and deploy the
configuration. LDAP auth, nrpe, diamond collector, udev configuration,
custom python disk add script, and everything on the Ceph preflight page
(http://ceph.com/docs/firefly/start/quick-start-preflight/)

Insert the journals into the case. Udev triggers my python code, which
partitions the SSDs and fires a Prowl alert (http://www.prowlapp.com/)
to my phone when it's finished.

Insert the OSDs into the case. Same thing, udev triggers the python
code, which selects the next available partition on the journals so OSDs
go on journal1partA, journal2partA, journal3partA, journal1partB,... for
the three journals in each node. The code then fires a salt event at the
master node with the OSD dev path, journal /dev/by-id/ path and node
hostname. The salt reactor on the master node takes this event and runs
a script on the admin node which passes those parameters to ceph-deploy,
which does the OSD deployment. Send Prowl alert on success or fail with
details.

Similarity, when an OSD fails, I remove it, and insert the new OSD. The
same process as above occurs. Logical removal I do manually, since I'm
not at a scale where it's common yet. Eventually, I imagine I'll write
code to trigger OSD removal on certain events using the same
event/reactor Salt framework.

Pool/CRUSH management:

Pool configuration and CRUSH management are mostly one-time operations.
That is, I'll make a change rarely and when I do it will persist in that
new state for a long time. Given that and the fact that I can make the
changes from one node and inject them into the cluster, I haven't needed
to automate that portion of Ceph as I've added more nodes, at least not yet.

Replacing journals:

I haven't had to do this yet; I'd probably remove/readd all the OSDs if
it happened today, but will be reading the post you linked.

Upgrading releases:

Change the configuration of /etc/apt/source.list.d/ceph.list to point at
new release and push to all the nodes with Salt. Then salt -N 'ceph'
pkg.upgrade to upgrade the packages on all the nodes in the ceph
nodegroup. Then, use Salt to restart the monitors, then the OSDs on each
node, one by one. Finally run the following command on all nodes with
Salt to verify all monitors/OSDs are using the new version:

for i in $(ls /var/run/ceph/ceph-*.asok);do echo $i;ceph --admin-daemon
$i version;done

Node decommissioning:

I have a script which enumerates all the OSDs on a given host and stores
that list in a file. Another script (run by cron every 10 minutes)
checks if the cluster health is OK, and if so pops the next OSD from
that file and executes the steps to remove it from the host, trickling
the node out of service.




On 04/17/2015 02:18 PM, Craig Lewis wrote:
> I'm running a small cluster, but I'll chime in since nobody else has.
>
> Cern had a presentation a while ago (dumpling time-frame) about their
> deployment.  They go over some of your
> questions: http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
>
> My philosophy on Config Management is that it should save me time.  If
> it's going to take me longer to write a recipe to do something, I'll
> just do it by hand. Since my cluster is small, there are many things I
> can do faster by hand.  This may or may not work for you, depending on
> your documentation / repeatability requirements.  For things that need
> to be documented, I'll usually write the recipe anyway (I accept Chef
> recipes as documentation).
>
>
> For my clusters, I'm using Chef to setups all nodes and manage
> ceph.conf.  I manually manage my pools, CRUSH map, RadosGW users, and
> disk replacement.  I was using Chef to add new disks, but I ran into
> load problems due to my small cluster size.  I'm currently adding
> disks manually, to manage cluster load better.  As my cluster gets
> larger, that'll be less important.
>
> I'm also doing upgrades manually, because it's less work than writing
> the Chef recipe to do a cluster upgrade.  Since Chef isn't cluster
> aware, it would be a a pain to make the recipe cluster aware enough to
> handle the upgrade.  And I figure if I stall long enough, somebody
> else will write it :-)  Ansible, with it's cluster wide coordination,
> looks like it would handle that a bit better.
>
>
>
> On Wed, Apr 15, 2015 at 2:05 PM, Stillwell, Bryan
> mailto:bryan.stillw...@twcable.com>> wrote:
>
> I'm curious what people managing larger ceph clusters are doing with
> configuration management and orchestration to simplify their lives?
>
> We've been using ceph-deploy to manage our ceph clusters s

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
I have two of them in my cluster (plus one 256GB version) for about half a
year now. So far so good. I'll be keeping a closer look at them.

pt., 17 kwi 2015, 21:07 Andrija Panic użytkownik 
napisał:

> nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
> wearing level is 96%, so only 4% wasted... (yes I know these are not
> enterprise,etc... )
>
> On 17 April 2015 at 21:01, Josef Johansson  wrote:
>
>> tough luck, hope everything comes up ok afterwards. What models on the
>> SSD?
>>
>> /Josef
>> On 17 Apr 2015 20:05, "Andrija Panic"  wrote:
>>
>>> SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
>>> down, and rebalancing is about finish... after which I need to fix the OSDs.
>>>
>>> On 17 April 2015 at 19:01, Josef Johansson  wrote:
>>>
 Hi,

 Did 6 other OSDs go down when re-adding?

 /Josef

 On 17 Apr 2015, at 18:49, Andrija Panic 
 wrote:

 12 osds down - I expect less work with removing and adding osd?
 On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
 krzysztof.a.nowi...@gmail.com> wrote:

> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
> existing OSD UUID, copy the keyring and let it populate itself?
>
> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
> andrija.pa...@gmail.com> napisał:
>
>> Thx guys, thats what I will be doing at the end.
>>
>> Cheers
>> On Apr 17, 2015 6:24 PM, "Robert LeBlanc" 
>> wrote:
>>
>>> Delete and re-add all six OSDs.
>>>
>>> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic <
>>> andrija.pa...@gmail.com> wrote:
>>>
 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
 down, ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would
 like to know, how to proceed now, with the journal recreation for 
 those 6
 OSDs that are down now.

 Should I flush journal (where to, journals doesnt still exist...?),
 or just recreate journal from scratch (making symboliv links again: ln 
 -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>  ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



>>>
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>
>
> --
>
> Andrija Panić
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
damn, good news for me, pssibly bad news for you :)
what is wearing level (samrtctl -a /dev/sdX) - attribute near the end of
the atribute list...

thx

On 17 April 2015 at 21:12, Krzysztof Nowicki 
wrote:

> I have two of them in my cluster (plus one 256GB version) for about half a
> year now. So far so good. I'll be keeping a closer look at them.
>
> pt., 17 kwi 2015, 21:07 Andrija Panic użytkownik 
> napisał:
>
>> nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
>> wearing level is 96%, so only 4% wasted... (yes I know these are not
>> enterprise,etc... )
>>
>> On 17 April 2015 at 21:01, Josef Johansson  wrote:
>>
>>> tough luck, hope everything comes up ok afterwards. What models on the
>>> SSD?
>>>
>>> /Josef
>>> On 17 Apr 2015 20:05, "Andrija Panic"  wrote:
>>>
 SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
 down, and rebalancing is about finish... after which I need to fix the 
 OSDs.

 On 17 April 2015 at 19:01, Josef Johansson  wrote:

> Hi,
>
> Did 6 other OSDs go down when re-adding?
>
> /Josef
>
> On 17 Apr 2015, at 18:49, Andrija Panic 
> wrote:
>
> 12 osds down - I expect less work with removing and adding osd?
> On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
> krzysztof.a.nowi...@gmail.com> wrote:
>
>> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with
>> the existing OSD UUID, copy the keyring and let it populate itself?
>>
>> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
>> andrija.pa...@gmail.com> napisał:
>>
>>> Thx guys, thats what I will be doing at the end.
>>>
>>> Cheers
>>> On Apr 17, 2015 6:24 PM, "Robert LeBlanc" 
>>> wrote:
>>>
 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic <
 andrija.pa...@gmail.com> wrote:

> Hi guys,
>
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
> down, ceph rebalanced etc.
>
> Now I have new SSD inside, and I will partition it etc - but would
> like to know, how to proceed now, with the journal recreation for 
> those 6
> OSDs that are down now.
>
> Should I flush journal (where to, journals doesnt still
> exist...?), or just recreate journal from scratch (making symboliv 
> links
> again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and
> starting OSDs.
>
> I expect the folowing procedure, but would like confirmation
> please:
>
> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
> ceph-osd -i $ID --mkjournal
> ll /var/lib/ceph/osd/ceph-$ID/journal
> service ceph start osd.$ID
>
> Any thought greatly appreciated !
>
> Thanks,
>
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
  ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>


 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>
>>
>> --
>>
>> Andrija Panić
>>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
the massive rebalancing does not affect the ssds in a good way either. But
from what I've gatherd the pro should be fine. Massive amount of write
errors in the logs?

/Josef
On 17 Apr 2015 21:07, "Andrija Panic"  wrote:

> nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
> wearing level is 96%, so only 4% wasted... (yes I know these are not
> enterprise,etc... )
>
> On 17 April 2015 at 21:01, Josef Johansson  wrote:
>
>> tough luck, hope everything comes up ok afterwards. What models on the
>> SSD?
>>
>> /Josef
>> On 17 Apr 2015 20:05, "Andrija Panic"  wrote:
>>
>>> SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
>>> down, and rebalancing is about finish... after which I need to fix the OSDs.
>>>
>>> On 17 April 2015 at 19:01, Josef Johansson  wrote:
>>>
 Hi,

 Did 6 other OSDs go down when re-adding?

 /Josef

 On 17 Apr 2015, at 18:49, Andrija Panic 
 wrote:

 12 osds down - I expect less work with removing and adding osd?
 On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
 krzysztof.a.nowi...@gmail.com> wrote:

> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
> existing OSD UUID, copy the keyring and let it populate itself?
>
> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
> andrija.pa...@gmail.com> napisał:
>
>> Thx guys, thats what I will be doing at the end.
>>
>> Cheers
>> On Apr 17, 2015 6:24 PM, "Robert LeBlanc" 
>> wrote:
>>
>>> Delete and re-add all six OSDs.
>>>
>>> On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic <
>>> andrija.pa...@gmail.com> wrote:
>>>
 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
 down, ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would
 like to know, how to proceed now, with the journal recreation for 
 those 6
 OSDs that are down now.

 Should I flush journal (where to, journals doesnt still exist...?),
 or just recreate journal from scratch (making symboliv links again: ln 
 -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>  ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



>>>
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>
>
> --
>
> Andrija Panić
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1

2015-04-17 Thread Chad William Seys
Now I also know I have too many PGs!  

It is fairly confusing to talk about PGs on the Pool page, but only vaguely 
talk about the number of PGs for the cluster.

Here are some examples of confusing statements with suggested alternatives 
from the online docs:

http://ceph.com/docs/master/rados/operations/pools/

"A typical configuration uses approximately 100 placement groups per OSD to 
provide optimal balancing without using up too many computing resources."
->
"A typical configuration uses approximately 100 placement groups per OSD for 
all pools to provide optimal balancing without using up too many computing 
resources."


http://ceph.com/docs/master/rados/operations/placement-groups/

"It is mandatory to choose the value of pg_num because it cannot be calculated 
automatically. Here are a few values commonly used:"
->
"It is mandatory to choose the value of pg_num.  Because pg_num depends on the 
planned number of pools in the cluster, it cannot be determined automatically 
on pool creation. Please use this calculator: http://ceph.com/pgcalc/";

Thanks!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Managing larger ceph clusters

2015-04-17 Thread Quentin Hartman
I also have a fairly small deployment of 14 nodes, 42 OSDs, but even I use
some automation. I do my OS installs and partitioning with PXE / kickstart,
then use chef for my baseline install of the "normal" server stuff in our
env and admin accounts. Then the ceph-specific stuff I handle by hand and
with ceph-deploy and some light wrapper scripts. Monitoring / alerting is
sensu and graphite. I tried Calamari, and it was nice. But it produced a
lot of load on the admin machine (especially considering the work it should
have been performing) and once I figured out how to get metrics into
"normal" graphite, the appeal of a ceph-specific tool was reduced
substantially.

QH

On Fri, Apr 17, 2015 at 1:07 PM, Steve Anthony  wrote:

>  For reference, I'm currently running 26 nodes (338 OSDs); will be 35
> nodes (455 OSDs) in the near future.
>
> Node/OSD provisioning and replacements:
>
> Mostly I'm using ceph-deploy, at least to do node/osd adds and
> replacements. Right now the process is:
>
> Use FAI (http://fai-project.org) to setup software RAID1/LVM for the OS
> disks, and do a minimal installation, including the salt-minion.
>
> Accept the new minion on the salt-master node and deploy the
> configuration. LDAP auth, nrpe, diamond collector, udev configuration,
> custom python disk add script, and everything on the Ceph preflight page (
> http://ceph.com/docs/firefly/start/quick-start-preflight/)
>
> Insert the journals into the case. Udev triggers my python code, which
> partitions the SSDs and fires a Prowl alert (http://www.prowlapp.com/) to
> my phone when it's finished.
>
> Insert the OSDs into the case. Same thing, udev triggers the python code,
> which selects the next available partition on the journals so OSDs go on
> journal1partA, journal2partA, journal3partA, journal1partB,... for the
> three journals in each node. The code then fires a salt event at the master
> node with the OSD dev path, journal /dev/by-id/ path and node hostname. The
> salt reactor on the master node takes this event and runs a script on the
> admin node which passes those parameters to ceph-deploy, which does the OSD
> deployment. Send Prowl alert on success or fail with details.
>
> Similarity, when an OSD fails, I remove it, and insert the new OSD. The
> same process as above occurs. Logical removal I do manually, since I'm not
> at a scale where it's common yet. Eventually, I imagine I'll write code to
> trigger OSD removal on certain events using the same event/reactor Salt
> framework.
>
> Pool/CRUSH management:
>
> Pool configuration and CRUSH management are mostly one-time operations.
> That is, I'll make a change rarely and when I do it will persist in that
> new state for a long time. Given that and the fact that I can make the
> changes from one node and inject them into the cluster, I haven't needed to
> automate that portion of Ceph as I've added more nodes, at least not yet.
>
> Replacing journals:
>
> I haven't had to do this yet; I'd probably remove/readd all the OSDs if it
> happened today, but will be reading the post you linked.
>
> Upgrading releases:
>
> Change the configuration of /etc/apt/source.list.d/ceph.list to point at
> new release and push to all the nodes with Salt. Then salt -N 'ceph'
> pkg.upgrade to upgrade the packages on all the nodes in the ceph nodegroup.
> Then, use Salt to restart the monitors, then the OSDs on each node, one by
> one. Finally run the following command on all nodes with Salt to verify all
> monitors/OSDs are using the new version:
>
> for i in $(ls /var/run/ceph/ceph-*.asok);do echo $i;ceph --admin-daemon $i
> version;done
>
> Node decommissioning:
>
> I have a script which enumerates all the OSDs on a given host and stores
> that list in a file. Another script (run by cron every 10 minutes) checks
> if the cluster health is OK, and if so pops the next OSD from that file and
> executes the steps to remove it from the host, trickling the node out of
> service.
>
>
>
>
>
> On 04/17/2015 02:18 PM, Craig Lewis wrote:
>
> I'm running a small cluster, but I'll chime in since nobody else has.
>
>  Cern had a presentation a while ago (dumpling time-frame) about their
> deployment.  They go over some of your questions:
> http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
>
>  My philosophy on Config Management is that it should save me time.  If
> it's going to take me longer to write a recipe to do something, I'll just
> do it by hand. Since my cluster is small, there are many things I can do
> faster by hand.  This may or may not work for you, depending on your
> documentation / repeatability requirements.  For things that need to be
> documented, I'll usually write the recipe anyway (I accept Chef recipes as
> documentation).
>
>
>  For my clusters, I'm using Chef to setups all nodes and manage
> ceph.conf.  I manually manage my pools, CRUSH map, RadosGW users, and disk
> replacement.  I was using Chef to add new disks, but I ran into load
> problems due to my small cl

[ceph-users] ceph-deploy journal on separate partition - quck info needed

2015-04-17 Thread Andrija Panic
Hi all,

when I run:

ceph-deploy osd create SERVER:sdi:/dev/sdb5

(sdi = previously ZAP-ed 4TB drive)
(sdb5 = previously manually created empty partition with fdisk)

Is ceph-deploy going to create journal properly on sdb5 (something similar
to: ceph-osd -i $ID --mkjournal ), or do I need to do something before this
?

I have actually already run this command but havent seen any "mkjournal"
commands in the output

OSD shows as up and in, but I have doubts if journal is fine (symlink does
link to /dev/sdb5) but again...

Any confimration is welcomed
Thanks,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
Checked the SMART status. All of the Samsungs have Wear Leveling Count
equal to 99 (raw values 29, 36 and 15). I'm going to have to monitor them -
I could afford loosing one of them, but loosing two would mean loss of data.

pt., 17 kwi 2015 o 21:22 użytkownik Josef Johansson 
napisał:

> the massive rebalancing does not affect the ssds in a good way either. But
> from what I've gatherd the pro should be fine. Massive amount of write
> errors in the logs?
>
> /Josef
> On 17 Apr 2015 21:07, "Andrija Panic"  wrote:
>
>> nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
>> wearing level is 96%, so only 4% wasted... (yes I know these are not
>> enterprise,etc... )
>>
>> On 17 April 2015 at 21:01, Josef Johansson  wrote:
>>
>>> tough luck, hope everything comes up ok afterwards. What models on the
>>> SSD?
>>>
>>> /Josef
>>> On 17 Apr 2015 20:05, "Andrija Panic"  wrote:
>>>
 SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
 down, and rebalancing is about finish... after which I need to fix the 
 OSDs.

 On 17 April 2015 at 19:01, Josef Johansson  wrote:

> Hi,
>
> Did 6 other OSDs go down when re-adding?
>
> /Josef
>
> On 17 Apr 2015, at 18:49, Andrija Panic 
> wrote:
>
> 12 osds down - I expect less work with removing and adding osd?
> On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" <
> krzysztof.a.nowi...@gmail.com> wrote:
>
>> Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with
>> the existing OSD UUID, copy the keyring and let it populate itself?
>>
>> pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic <
>> andrija.pa...@gmail.com> napisał:
>>
>>> Thx guys, thats what I will be doing at the end.
>>>
>>> Cheers
>>> On Apr 17, 2015 6:24 PM, "Robert LeBlanc" 
>>> wrote:
>>>
 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic <
 andrija.pa...@gmail.com> wrote:

> Hi guys,
>
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
> down, ceph rebalanced etc.
>
> Now I have new SSD inside, and I will partition it etc - but would
> like to know, how to proceed now, with the journal recreation for 
> those 6
> OSDs that are down now.
>
> Should I flush journal (where to, journals doesnt still
> exist...?), or just recreate journal from scratch (making symboliv 
> links
> again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and
> starting OSDs.
>
> I expect the folowing procedure, but would like confirmation
> please:
>
> rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
> ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
> ceph-osd -i $ID --mkjournal
> ll /var/lib/ceph/osd/ceph-$ID/journal
> service ceph start osd.$ID
>
> Any thought greatly appreciated !
>
> Thanks,
>
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
  ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>


 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>
>>
>> --
>>
>> Andrija Panić
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy journal on separate partition - quck info needed

2015-04-17 Thread Robert LeBlanc
If the journal file on the osd is a symlink to the partition and the OSD
process is running, then the journal was created properly. The OSD would
not start if the journal was not created.

On Fri, Apr 17, 2015 at 2:43 PM, Andrija Panic 
wrote:

> Hi all,
>
> when I run:
>
> ceph-deploy osd create SERVER:sdi:/dev/sdb5
>
> (sdi = previously ZAP-ed 4TB drive)
> (sdb5 = previously manually created empty partition with fdisk)
>
> Is ceph-deploy going to create journal properly on sdb5 (something similar
> to: ceph-osd -i $ID --mkjournal ), or do I need to do something before this
> ?
>
> I have actually already run this command but havent seen any "mkjournal"
> commands in the output
>
> OSD shows as up and in, but I have doubts if journal is fine (symlink does
> link to /dev/sdb5) but again...
>
> Any confimration is welcomed
> Thanks,
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy journal on separate partition - quck info needed

2015-04-17 Thread Andrija Panic
ok, thx Robert - I expected that so this is fine then - just done it on 12
OSDs and all fine...

thx again

On 17 April 2015 at 23:38, Robert LeBlanc  wrote:

> If the journal file on the osd is a symlink to the partition and the OSD
> process is running, then the journal was created properly. The OSD would
> not start if the journal was not created.
>
> On Fri, Apr 17, 2015 at 2:43 PM, Andrija Panic 
> wrote:
>
>> Hi all,
>>
>> when I run:
>>
>> ceph-deploy osd create SERVER:sdi:/dev/sdb5
>>
>> (sdi = previously ZAP-ed 4TB drive)
>> (sdb5 = previously manually created empty partition with fdisk)
>>
>> Is ceph-deploy going to create journal properly on sdb5 (something
>> similar to: ceph-osd -i $ID --mkjournal ), or do I need to do something
>> before this ?
>>
>> I have actually already run this command but havent seen any "mkjournal"
>> commands in the output
>>
>> OSD shows as up and in, but I have doubts if journal is fine (symlink
>> does link to /dev/sdb5) but again...
>>
>> Any confimration is welcomed
>> Thanks,
>> --
>>
>> Andrija Panić
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS and Erasure Codes

2015-04-17 Thread Ben Randall
Hello all,

I am considering using Ceph for a new deployment and have a few
questions about the current implementation of erasure codes.

I understand that erasure codes have been enabled for pools, but that
erasure coded pools cannot be used as the basis of a Ceph FS.  Is it
fair to infer that erasure coded storage pools are only accessible
through a library (rados) or API?

Are there any plans to support erasure coded pools in Ceph FS?  It
would be excellent to be able to mount erasure coded storage in a
POSIX-like fashion, similar to how Ceph FS with replicated pools works
now.

Thank you for your help!

Best,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and Erasure Codes

2015-04-17 Thread Loic Dachary
Hi,

Although erasure coded pools cannot be used with CephFS, they can be used 
behind a replicated cache pool as explained at 
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/. 

Cheers

On 18/04/2015 00:26, Ben Randall wrote:
> Hello all,
> 
> I am considering using Ceph for a new deployment and have a few
> questions about the current implementation of erasure codes.
> 
> I understand that erasure codes have been enabled for pools, but that
> erasure coded pools cannot be used as the basis of a Ceph FS.  Is it
> fair to infer that erasure coded storage pools are only accessible
> through a library (rados) or API?
> 
> Are there any plans to support erasure coded pools in Ceph FS?  It
> would be excellent to be able to mount erasure coded storage in a
> POSIX-like fashion, similar to how Ceph FS with replicated pools works
> now.
> 
> Thank you for your help!
> 
> Best,
> Ben
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] metadata management in case of ceph object storage and ceph block storage

2015-04-17 Thread pragya jain
Thanks to all for your reply -RegardsPragya JainDepartment of Computer 
ScienceUniversity of DelhiDelhi, India 


 On Friday, 17 April 2015 4:36 PM, Steffen W Sørensen  wrote:
   
 

 

On 17/04/2015, at 07.33, Josef Johansson  wrote:
To your question, which I’m not sure I understand completely.
So yes, you don’t need the MDS if you just keep track of block storage and 
object storage. (i.e. images for KVM)
So the Mon keeps track of the metadata for the Pool and PG
Well there really ain’t no metadata at all as with a traditional File System, 
monitors keep track of status of OSDs. Client compute which OSDs to go talk to 
to get to wanted objects. thus no need for central meta data service to tell 
clients where data are stored. Ceph is a distributed object storage system with 
potential no SPF and ability to scale out.Try studying Ross’ slides f.ex. 
here:http://www.slideshare.net/buildacloud/ceph-intro-and-architectural-overview-by-ross-turkor
 many other good intros on the net, youtube etc.
Clients of a Ceph Cluster can access ‘objects’ (blobs with data) through 
several means, programatic with librados, as virtual block devices through 
librbd+librados, and finally as a S3 service through rados GW over http[s]  the 
meta data (users + ACLs, buckets+data…) for S3 objects are stored in various 
pools in Ceph.
CephFS built on top of a Ceph object store can best be compared with 
combination of a POSIX File System and other Networked File Systems f.ex. 
NFS,CiFS, AFP, only with a different protocol + access mean (FUSE daemon or 
kernel module). As it implements a regular file name space, it needs to store 
meta data of which files exist in such a name space, this is the job of the MDS 
server(s) which of course uses Ceph object store pools to persistent store this 
file system meta data info


and the MDS keep track of all the files, hence the MDS should have at least 10x 
the memory of what the Mon have.
Hmm 10x memory isn’t a rule of thumb in my book, it all depends of use case at 
hand.MDS tracks meta data of files stored in a CephFS, which usually is far 
from all data of a cluster unless CephFS is the only usage of course :)Many use 
Ceph for sharing virtual block devices among multiple Hypervisors as disk 
devices for virtual machines (VM images), f.ex. with Openstack, Proxmox etc. 


I’m no Ceph expert, especially not on CephFS, but this is my picture of it :)
Maybe the architecture docs could help you out? 
http://docs.ceph.com/docs/master/architecture/#cluster-map
Hope that resolves your question.
Cheers,Josef

On 06 Apr 2015, at 18:51, pragya jain  wrote:
Please somebody reply my queries.
Thank yuo -RegardsPragya JainDepartment of Computer ScienceUniversity of 
DelhiDelhi, India 


 On Saturday, 4 April 2015 3:24 PM, pragya jain  
wrote:
   
 

 hello all!
As the documentation said "One of the unique features of Ceph is that it 
decouples data and metadata".for applying the mechanism of decoupling, Ceph 
uses Metadata Server (MDS) cluster.MDS cluster manages metadata operations, 
like open or rename a file
On the other hand, Ceph implementation for object storage as a service and 
block storage as a service does not require MDS implementation.
My question is:In case of object storage and block storage, how does Ceph 
manage the metadata?
Please help me to understand this concept more clearly.
Thank you -RegardsPragya JainDepartment of Computer ScienceUniversity of 
DelhiDelhi, India

 
   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full ssd setup preliminary hammer bench

2015-04-17 Thread Alexandre DERUMIER
>>Any quick write performance data? 

4k randwrite
iops : 12K
hostcpu: 85.5 idle
client cpu : 98,5 id idle
disk util : 100%  (this is the bottleneck).

This s3500 drives can do around 25K rand 4K write with O_DSYNC 

So, with ceph double write (journal + datas), that's explain the 12K





- Mail original -
De: "Michal Kozanecki" 
À: "aderumier" , "Mark Nelson" , 
"ceph-users" 
Envoyé: Vendredi 17 Avril 2015 18:13:59
Objet: RE: full ssd setup preliminary hammer bench

Any quick write performance data? 

Michal Kozanecki | Linux Administrator | E: mkozane...@evertz.com 


-Original Message- 
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER 
Sent: April-17-15 11:38 AM 
To: Mark Nelson; ceph-users 
Subject: [ceph-users] full ssd setup preliminary hammer bench 

Hi Mark, 

I finally got my hardware for my production full ssd cluster. 

Here a first preliminary bench. (1osd). 

I got around 45K iops with randread 4K with a small 10GB rbd volume 


I'm pretty happy because I don't see anymore huge cpu difference between krbd 
&& lirbd. 
In my previous bench I was using debian wheezy as client, now it's a centos 
7.1, so maybe something is different (glibc,...). 

I'm planning to do big benchmark centos vs ubuntu vs debian, client && server, 
to compare. 
I have 18 osd ssd for the benchmarks. 







results : rand 4K : 1 osd 
- 

fio + librbd: 
 
iops: 45.1K 

clat percentiles (usec): 
| 1.00th=[ 358], 5.00th=[ 406], 10.00th=[ 446], 20.00th=[ 556], 
| 30.00th=[ 676], 40.00th=[ 1048], 50.00th=[ 1192], 60.00th=[ 1304], 
| 70.00th=[ 1400], 80.00th=[ 1496], 90.00th=[ 1624], 95.00th=[ 1720], 
| 99.00th=[ 1880], 99.50th=[ 1928], 99.90th=[ 2064], 99.95th=[ 2128], 
| 99.99th=[ 2512] 

cpu server : 89.1 iddle 
cpu client : 92,5 idle 

fio + krbd 
-- 
iops:47.5K 

clat percentiles (usec): 
| 1.00th=[ 620], 5.00th=[ 636], 10.00th=[ 644], 20.00th=[ 652], 
| 30.00th=[ 668], 40.00th=[ 676], 50.00th=[ 684], 60.00th=[ 692], 
| 70.00th=[ 708], 80.00th=[ 724], 90.00th=[ 756], 95.00th=[ 820], 
| 99.00th=[ 1004], 99.50th=[ 1032], 99.90th=[ 1144], 99.95th=[ 1448], 
| 99.99th=[ 2224] 

cpu server : 92.4 idle 
cpu client : 96,8 idle 




hardware (ceph node && client node): 
--- 
ceph : hammer 
os : centos 7.1 
2 x 10cores Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 64GB ram 
2 x intel s3700 100GB : raid1: os + monitor 
6 x intel s3500 160GB : osds 
2x10gb mellanox connect-x3 (lacp) 

network 
--- 
mellanox sx1012 with breakout cables (10GB) 


centos tunning: 
--- 
-noop scheduler 
-tune-adm profile latency-performance 

ceph.conf 
- 
auth_cluster_required = cephx 
auth_service_required = cephx 
auth_client_required = cephx 
filestore_xattr_use_omap = true 


osd pool default min size = 1 

debug lockdep = 0/0 
debug context = 0/0 
debug crush = 0/0 
debug buffer = 0/0 
debug timer = 0/0 
debug journaler = 0/0 
debug osd = 0/0 
debug optracker = 0/0 
debug objclass = 0/0 
debug filestore = 0/0 
debug journal = 0/0 
debug ms = 0/0 
debug monc = 0/0 
debug tp = 0/0 
debug auth = 0/0 
debug finisher = 0/0 
debug heartbeatmap = 0/0 
debug perfcounter = 0/0 
debug asok = 0/0 
debug throttle = 0/0 

osd_op_threads = 5 
filestore_op_threads = 4 


osd_op_num_threads_per_shard = 1 
osd_op_num_shards = 10 
filestore_fd_cache_size = 64 
filestore_fd_cache_shards = 32 
ms_nocrc = true 
ms_dispatch_throttle_bytes = 0 

cephx sign messages = false 
cephx require signatures = false 

[client] 
rbd_cache = false 





rand 4K : rbd volume size: 10GB (data in osd node buffer - no access to disk) 
-- 
fio + librbd 
 
[global] 
ioengine=rbd 
clientname=admin 
pool=pooltest 
rbdname=rbdtest 
invalidate=0 
rw=randread 
direct=1 
bs=4k 
numjobs=2 
group_reporting=1 
iodepth=32 



fio + krbd 
--- 
[global] 
ioengine=aio 
invalidate=1 # mandatory 
rw=randread 
bs=4K 
direct=1 
numjobs=2 
group_reporting=1 
size=10G 

iodepth=32 
filename=/dev/rbd0 (noop scheduler) 






___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full ssd setup preliminary hammer bench

2015-04-17 Thread Alexandre DERUMIER
>>any idea whether this might be the tcmalloc bug? 

I still don't known if centos/redhat packages have also the bug or not.
gperftools.x86_64   2.1-1.el7




- Mail original -
De: "Stefan Priebe" 
À: "aderumier" , "Mark Nelson" , 
"ceph-users" 
Envoyé: Vendredi 17 Avril 2015 20:57:42
Objet: Re: [ceph-users] full ssd setup preliminary hammer bench

Am 17.04.2015 um 17:37 schrieb Alexandre DERUMIER: 
> Hi Mark, 
> 
> I finally got my hardware for my production full ssd cluster. 
> 
> Here a first preliminary bench. (1osd). 
> 
> I got around 45K iops with randread 4K with a small 10GB rbd volume 
> 
> 
> I'm pretty happy because I don't see anymore huge cpu difference between krbd 
> && lirbd. 
> In my previous bench I was using debian wheezy as client, 
> now it's a centos 7.1, so maybe something is different (glibc,...). 

any idea whether this might be the tcmalloc bug? 

> 
> I'm planning to do big benchmark centos vs ubuntu vs debian, client && 
> server, to compare. 
> I have 18 osd ssd for the benchmarks. 
> 
> 
> 
> 
> 
> 
> 
> results : rand 4K : 1 osd 
> - 
> 
> fio + librbd: 
>  
> iops: 45.1K 
> 
> clat percentiles (usec): 
> | 1.00th=[ 358], 5.00th=[ 406], 10.00th=[ 446], 20.00th=[ 556], 
> | 30.00th=[ 676], 40.00th=[ 1048], 50.00th=[ 1192], 60.00th=[ 1304], 
> | 70.00th=[ 1400], 80.00th=[ 1496], 90.00th=[ 1624], 95.00th=[ 1720], 
> | 99.00th=[ 1880], 99.50th=[ 1928], 99.90th=[ 2064], 99.95th=[ 2128], 
> | 99.99th=[ 2512] 
> 
> cpu server : 89.1 iddle 
> cpu client : 92,5 idle 
> 
> fio + krbd 
> -- 
> iops:47.5K 
> 
> clat percentiles (usec): 
> | 1.00th=[ 620], 5.00th=[ 636], 10.00th=[ 644], 20.00th=[ 652], 
> | 30.00th=[ 668], 40.00th=[ 676], 50.00th=[ 684], 60.00th=[ 692], 
> | 70.00th=[ 708], 80.00th=[ 724], 90.00th=[ 756], 95.00th=[ 820], 
> | 99.00th=[ 1004], 99.50th=[ 1032], 99.90th=[ 1144], 99.95th=[ 1448], 
> | 99.99th=[ 2224] 
> 
> cpu server : 92.4 idle 
> cpu client : 96,8 idle 
> 
> 
> 
> 
> hardware (ceph node && client node): 
> --- 
> ceph : hammer 
> os : centos 7.1 
> 2 x 10cores Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
> 64GB ram 
> 2 x intel s3700 100GB : raid1: os + monitor 
> 6 x intel s3500 160GB : osds 
> 2x10gb mellanox connect-x3 (lacp) 
> 
> network 
> --- 
> mellanox sx1012 with breakout cables (10GB) 
> 
> 
> centos tunning: 
> --- 
> -noop scheduler 
> -tune-adm profile latency-performance 
> 
> ceph.conf 
> - 
> auth_cluster_required = cephx 
> auth_service_required = cephx 
> auth_client_required = cephx 
> filestore_xattr_use_omap = true 
> 
> 
> osd pool default min size = 1 
> 
> debug lockdep = 0/0 
> debug context = 0/0 
> debug crush = 0/0 
> debug buffer = 0/0 
> debug timer = 0/0 
> debug journaler = 0/0 
> debug osd = 0/0 
> debug optracker = 0/0 
> debug objclass = 0/0 
> debug filestore = 0/0 
> debug journal = 0/0 
> debug ms = 0/0 
> debug monc = 0/0 
> debug tp = 0/0 
> debug auth = 0/0 
> debug finisher = 0/0 
> debug heartbeatmap = 0/0 
> debug perfcounter = 0/0 
> debug asok = 0/0 
> debug throttle = 0/0 
> 
> osd_op_threads = 5 
> filestore_op_threads = 4 
> 
> 
> osd_op_num_threads_per_shard = 1 
> osd_op_num_shards = 10 
> filestore_fd_cache_size = 64 
> filestore_fd_cache_shards = 32 
> ms_nocrc = true 
> ms_dispatch_throttle_bytes = 0 
> 
> cephx sign messages = false 
> cephx require signatures = false 
> 
> [client] 
> rbd_cache = false 
> 
> 
> 
> 
> 
> rand 4K : rbd volume size: 10GB (data in osd node buffer - no access to disk) 
> --
>  
> fio + librbd 
>  
> [global] 
> ioengine=rbd 
> clientname=admin 
> pool=pooltest 
> rbdname=rbdtest 
> invalidate=0 
> rw=randread 
> direct=1 
> bs=4k 
> numjobs=2 
> group_reporting=1 
> iodepth=32 
> 
> 
> 
> fio + krbd 
> --- 
> [global] 
> ioengine=aio 
> invalidate=1 # mandatory 
> rw=randread 
> bs=4K 
> direct=1 
> numjobs=2 
> group_reporting=1 
> size=10G 
> 
> iodepth=32 
> filename=/dev/rbd0 (noop scheduler) 
> 
> 
> 
> 
> 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full ssd setup preliminary hammer bench

2015-04-17 Thread Stefan Priebe - Profihost AG

Am 18.04.2015 um 07:24 schrieb Alexandre DERUMIER :

>>> any idea whether this might be the tcmalloc bug?
> 
> I still don't known if centos/redhat packages have also the bug or not.
> gperftools.x86_64   2.1-1.el7

From the version number it looks buggy. I'm really interested what fixed the 
issue for you.

> 
> 
> 
> 
> - Mail original -
> De: "Stefan Priebe" 
> À: "aderumier" , "Mark Nelson" , 
> "ceph-users" 
> Envoyé: Vendredi 17 Avril 2015 20:57:42
> Objet: Re: [ceph-users] full ssd setup preliminary hammer bench
> 
>> Am 17.04.2015 um 17:37 schrieb Alexandre DERUMIER: 
>> Hi Mark, 
>> 
>> I finally got my hardware for my production full ssd cluster. 
>> 
>> Here a first preliminary bench. (1osd). 
>> 
>> I got around 45K iops with randread 4K with a small 10GB rbd volume 
>> 
>> 
>> I'm pretty happy because I don't see anymore huge cpu difference between 
>> krbd && lirbd. 
>> In my previous bench I was using debian wheezy as client, 
>> now it's a centos 7.1, so maybe something is different (glibc,...).
> 
> any idea whether this might be the tcmalloc bug? 
> 
>> 
>> I'm planning to do big benchmark centos vs ubuntu vs debian, client && 
>> server, to compare. 
>> I have 18 osd ssd for the benchmarks. 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> results : rand 4K : 1 osd 
>> - 
>> 
>> fio + librbd: 
>>  
>> iops: 45.1K 
>> 
>> clat percentiles (usec): 
>> | 1.00th=[ 358], 5.00th=[ 406], 10.00th=[ 446], 20.00th=[ 556], 
>> | 30.00th=[ 676], 40.00th=[ 1048], 50.00th=[ 1192], 60.00th=[ 1304], 
>> | 70.00th=[ 1400], 80.00th=[ 1496], 90.00th=[ 1624], 95.00th=[ 1720], 
>> | 99.00th=[ 1880], 99.50th=[ 1928], 99.90th=[ 2064], 99.95th=[ 2128], 
>> | 99.99th=[ 2512] 
>> 
>> cpu server : 89.1 iddle 
>> cpu client : 92,5 idle 
>> 
>> fio + krbd 
>> -- 
>> iops:47.5K 
>> 
>> clat percentiles (usec): 
>> | 1.00th=[ 620], 5.00th=[ 636], 10.00th=[ 644], 20.00th=[ 652], 
>> | 30.00th=[ 668], 40.00th=[ 676], 50.00th=[ 684], 60.00th=[ 692], 
>> | 70.00th=[ 708], 80.00th=[ 724], 90.00th=[ 756], 95.00th=[ 820], 
>> | 99.00th=[ 1004], 99.50th=[ 1032], 99.90th=[ 1144], 99.95th=[ 1448], 
>> | 99.99th=[ 2224] 
>> 
>> cpu server : 92.4 idle 
>> cpu client : 96,8 idle 
>> 
>> 
>> 
>> 
>> hardware (ceph node && client node): 
>> --- 
>> ceph : hammer 
>> os : centos 7.1 
>> 2 x 10cores Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
>> 64GB ram 
>> 2 x intel s3700 100GB : raid1: os + monitor 
>> 6 x intel s3500 160GB : osds 
>> 2x10gb mellanox connect-x3 (lacp) 
>> 
>> network 
>> --- 
>> mellanox sx1012 with breakout cables (10GB) 
>> 
>> 
>> centos tunning: 
>> --- 
>> -noop scheduler 
>> -tune-adm profile latency-performance 
>> 
>> ceph.conf 
>> - 
>> auth_cluster_required = cephx 
>> auth_service_required = cephx 
>> auth_client_required = cephx 
>> filestore_xattr_use_omap = true 
>> 
>> 
>> osd pool default min size = 1 
>> 
>> debug lockdep = 0/0 
>> debug context = 0/0 
>> debug crush = 0/0 
>> debug buffer = 0/0 
>> debug timer = 0/0 
>> debug journaler = 0/0 
>> debug osd = 0/0 
>> debug optracker = 0/0 
>> debug objclass = 0/0 
>> debug filestore = 0/0 
>> debug journal = 0/0 
>> debug ms = 0/0 
>> debug monc = 0/0 
>> debug tp = 0/0 
>> debug auth = 0/0 
>> debug finisher = 0/0 
>> debug heartbeatmap = 0/0 
>> debug perfcounter = 0/0 
>> debug asok = 0/0 
>> debug throttle = 0/0 
>> 
>> osd_op_threads = 5 
>> filestore_op_threads = 4 
>> 
>> 
>> osd_op_num_threads_per_shard = 1 
>> osd_op_num_shards = 10 
>> filestore_fd_cache_size = 64 
>> filestore_fd_cache_shards = 32 
>> ms_nocrc = true 
>> ms_dispatch_throttle_bytes = 0 
>> 
>> cephx sign messages = false 
>> cephx require signatures = false 
>> 
>> [client] 
>> rbd_cache = false 
>> 
>> 
>> 
>> 
>> 
>> rand 4K : rbd volume size: 10GB (data in osd node buffer - no access to 
>> disk) 
>> --
>>  
>> fio + librbd 
>>  
>> [global] 
>> ioengine=rbd 
>> clientname=admin 
>> pool=pooltest 
>> rbdname=rbdtest 
>> invalidate=0 
>> rw=randread 
>> direct=1 
>> bs=4k 
>> numjobs=2 
>> group_reporting=1 
>> iodepth=32 
>> 
>> 
>> 
>> fio + krbd 
>> --- 
>> [global] 
>> ioengine=aio 
>> invalidate=1 # mandatory 
>> rw=randread 
>> bs=4K 
>> direct=1 
>> numjobs=2 
>> group_reporting=1 
>> size=10G 
>> 
>> iodepth=32 
>> filename=/dev/rbd0 (noop scheduler) 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com