Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Sudarshan Pathak
Hello Dennis, You can create CRUSH rule to select one of osd as primary as: rule ssd-primary { ruleset 5 type replicated min_size 5 max_size 10 step take ssd step chooseleaf firstn 1 type host step

[ceph-users] Fwd: error opening rbd image

2015-02-01 Thread Aleksey Leonov
Hello I cant open rbd image after restart cluster. I user rbd image for KVM virtual machine. ceph version 0.87 uname -a Linux ceph4 3.14.31-gentoo #1 SMP Fri Jan 30 22:24:11 YEKT 2015 x86_64 Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz GenuineIntel GNU/Linux rbd info raid0n/homes rbd: error opening

Re: [ceph-users] cephfs: from a file name determine the objects name

2015-02-01 Thread Dennis Chen
Hi Sage, Are there some methods to prevent the file from being stripped to multiple objects? eg, the file will be mapped to only one object in the client side with ceph FS interface... On Mon, Feb 2, 2015 at 1:05 AM, Sage Weil wrote: > It's inode number (in hex), then ".", then block number (in

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Dennis Chen
Hi Sudarshan, Some hints for doing that ? On Mon, Feb 2, 2015 at 1:03 PM, Sudarshan Pathak wrote: > BTW, you can make crush to always choose the same OSD as primary. > > Regards, > Sudarshan > > On Mon, Feb 2, 2015 at 9:26 AM, Dennis Chen > wrote: >> >> Thanks, I've have the answer with the 'ce

Re: [ceph-users] ceph Performance random write is more then sequential

2015-02-01 Thread Somnath Roy
Sumit, I think random read/write will always outperform sequential read/write in Ceph if we don’t have any kind of cache in front or you have proper striping enabled in the image. The reason is the following. 1. If you are trying with the default image option, the object size is 4 MB and the st

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Sudarshan Pathak
BTW, you can make crush to always choose the same OSD as primary. ​ ​ Regards, Sudarshan On Mon, Feb 2, 2015 at 9:26 AM, Dennis Chen wrote: > Thanks, I've have the answer with the 'ceph osd map ...' command > > On Mon, Feb 2, 2015 at 12:50 AM, Jean-Charles Lopez > wrote: > > Hi > > > > You can

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Dennis Chen
Thanks, I've have the answer with the 'ceph osd map ...' command On Mon, Feb 2, 2015 at 12:50 AM, Jean-Charles Lopez wrote: > Hi > > You can verify the exact mapping using the following command: ceph osd map > {poolname} {objectname} > > Check page http://docs.ceph.com/docs/master/man/8/ceph for

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Dennis Chen
On Mon, Feb 2, 2015 at 12:04 AM, Loic Dachary wrote: > > > On 01/02/2015 14:47, Dennis Chen wrote: >> Hello, >> >> If I write 2 different objects, eg, "john" and "paul" respectively to >> a same pool like "testpool" in the cluster, is the primary OSD >> calculated by CRUSH for the 2 objects the sa

Re: [ceph-users] ceph Performance random write is more then sequential

2015-02-01 Thread Sumit Gaur
Hi All, What I saw after enabling RBD cache it is working as expected, means sequential write has better MBps than random write. can somebody explain this behaviour ? Is RBD cache setting must for ceph cluster to behave normally ? Thanks sumit On Mon, Feb 2, 2015 at 9:59 AM, Sumit Gaur wrote: >

[ceph-users] OSD can't start After server restart

2015-02-01 Thread wsnote
Ceph Version: 0.80.1 Server Number: 4 OSD Number: 6disks per server All of the OSDs of one server can't start After this server restart, but other 3 servers can. --- ceph -s: [root@dn1 ~]# ceph -s cluster 73ceed62-9a53-414b-95

[ceph-users] error opening rbd image

2015-02-01 Thread Aleksey Leonov
Hello I cant open rbd image after restart cluster. I user rbd image for KVM virtual machine. ceph version 0.87 uname -a Linux ceph4 3.14.31-gentoo #1 SMP Fri Jan 30 22:24:11 YEKT 2015 x86_64 Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz GenuineIntel GNU/Linux rbd info raid0n/homes rbd: error opening

Re: [ceph-users] ceph Performance random write is more then sequential

2015-02-01 Thread Sumit Gaur
Hi Florent, Cache tiering , No . ** Our Architecture : vdbench/FIO inside VM <--> RBD without cache <-> Ceph Cluster (6 OSDs + 3 Mons) Thanks sumit [root@ceph-mon01 ~]# ceph -s cluster 47b3b559-f93c-4259-a6fb-97b00d87c55a health HEALTH_WARN clock skew detected on mon.ceph-mon02, mon.c

[ceph-users] RBD snap unprotect need ACLs on all pools ?

2015-02-01 Thread Florent MONTHEL
Hi, I’ve ACL with key/user on on 1 pool (client.condor rwx on pool rbdpartigsanmdev01) I would like to unprotect snapshot but I’ve below error : rbd -n client.condor snap unprotect rbdpartigsanmdev01/flaprdsvc01_lun003@sync#1.cloneref.2015-02-01.19:07:21 2015-02-01 22:53:00.903790 7f4d0036e760

Re: [ceph-users] RBD caching on 4K reads???

2015-02-01 Thread Mykola Golub
On Fri, Jan 30, 2015 at 10:09:32PM +0100, Udo Lembke wrote: > Hi Bruce, > you can also look on the mon, like > ceph --admin-daemon /var/run/ceph/ceph-mon.b.asok config show | grep cache rbd cache is a client setting, so you have to check this connecting to the client admin socket. Its location is

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, On 1 Feb 2015 22:04, "Xu (Simon) Chen" wrote: > > Dan, > > I alway have noout set, so that single OSD failures won't trigger any recovery immediately. When the OSD (or sometimes multiple OSDs on the same server) comes back, I do see slow requests during backfilling, but probably not thousands.

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen
Dan, I alway have noout set, so that single OSD failures won't trigger any recovery immediately. When the OSD (or sometimes multiple OSDs on the same server) comes back, I do see slow requests during backfilling, but probably not thousands. When I added a brand new OSD into the cluster, for some r

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, When do you see thousands of slow requests during recovery... Does that happen even with single OSD failures? You should be able to recover disks without slow requests. I always run with recovery op priority at the minimum 1. Tweaking the number of max backfills did not change much during that

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Udo Lembke
Hi Xu, On 01.02.2015 21:39, Xu (Simon) Chen wrote: > RBD doesn't work extremely well when ceph is recovering - it is common > to see hundreds or a few thousands of blocked requests (>30s to > finish). This translates high IO wait inside of VMs, and many > applications don't deal with this well. th

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen
In my case, each object is 8MB (glance default for storing images on rbd backend.) RBD doesn't work extremely well when ceph is recovering - it is common to see hundreds or a few thousands of blocked requests (>30s to finish). This translates high IO wait inside of VMs, and many applications don't

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, I don't know the general calculation, but last week we split a pool with 20 million tiny objects from 512 to 1024 pgs, on a cluster with 80 OSDs. IIRC around 7 million objects needed to move, and it took around 13 hours to finish. The bottleneck in our case was objects per second (limited to ar

Re: [ceph-users] erasure code : number of chunks for a small cluster ?

2015-02-01 Thread Udo Lembke
Hi Alexandre, nice to meet you here ;-) With 3 hosts only you can't survive an full node failure, because for that you need host >= k + m. And k:1 m:2 don't make any sense. I start with 5 hosts and use k:3, m:2. In this case two hdds can fail or one host can be down for maintenance. Udo PS: yo

Re: [ceph-users] erasure code : number of chunks for a small cluster ?

2015-02-01 Thread Loic Dachary
Hi Alexandre, On 01/02/2015 18:15, Alexandre DERUMIER wrote: > Hi, > > I'm currently trying to understand how to setup correctly a pool with erasure > code > > > https://ceph.com/docs/v0.80/dev/osd_internals/erasure_coding/developer_notes/ > > > My cluster is 3 nodes with 6 osd for each node

[ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen
Hi folks, I was running a ceph cluster with 33 OSDs. More recently, 33x6 new OSDs hosted on 33 new servers were added, and I have finished balancing the data and then marked the 33 old OSDs out. As I have 6x as many OSDs, I am thinking of increasing pg_num of my largest pool from 1k to at least 8

[ceph-users] erasure code : number of chunks for a small cluster ?

2015-02-01 Thread Alexandre DERUMIER
Hi, I'm currently trying to understand how to setup correctly a pool with erasure code https://ceph.com/docs/v0.80/dev/osd_internals/erasure_coding/developer_notes/ My cluster is 3 nodes with 6 osd for each node (18 osd total). I want to be able to survive of 2 disk failures, but also a full

Re: [ceph-users] cephfs: from a file name determine the objects name

2015-02-01 Thread Sage Weil
It's inode number (in hex), then ".", then block number (in hex). You can get the ino of a file with stat. sage On February 1, 2015 5:08:18 PM GMT+01:00, Mudit Verma wrote: >Hi All, > >CEPHFS - Given a file name, how can one determine the exact location >and >the name of the objects on OSDs.

Re: [ceph-users] cephfs: from a file name determine the objects name

2015-02-01 Thread Jean-Charles Lopez
Hi Verma All lauout questions are detailed here for CephFS. http://docs.ceph.com/docs/master/cephfs/file-layouts/ Hope this is what you are looking for Cheers JC While moving. Excuse unintended typos. > On Feb 1, 2015, at 08:08, Mudit Verma wrote: > > Hi All, > > CEPHFS - Given a file na

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Jean-Charles Lopez
Hi You can verify the exact mapping using the following command: ceph osd map {poolname} {objectname} Check page http://docs.ceph.com/docs/master/man/8/ceph for the ceph command. Cheers JC While moving. Excuse unintended typos. > On Feb 1, 2015, at 08:04, Loic Dachary wrote: > > > >> On

Re: [ceph-users] OSD capacity variance ?

2015-02-01 Thread Udo Lembke
Hi Howard, I assume it's an typo with 160 + 250 MB. Ceph OSDs must be min. 10GB to get an weight of 0.01 Udo On 31.01.2015 23:39, Howard Thomson wrote: > Hi All, > > I am developing a custom disk storage backend for the Bacula backup > system, and am in the process of setting up a trial Ceph syst

[ceph-users] Arbitrary OSD Number Assignment

2015-02-01 Thread Ron Allred
Hello, In the past we've been able to manually create specific and arbitrary OSD numbers. Using the procedure: 1. Add OSD.# to ceph.conf (replicate) 2. Make necessary dir in /var/lib/ceph/osd/ceph-# 3. Create OSD+Journal partitions and filesystems, then mount it 4. Init data dirs with: ceph-os

[ceph-users] cephfs: from a file name determine the objects name

2015-02-01 Thread Mudit Verma
Hi All, CEPHFS - Given a file name, how can one determine the exact location and the name of the objects on OSDs. So far I could understand that the objects data is stored in .../current dir in OSDs, but what naming convention do they use? Many thanks in advance Thanks Mudit ___

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Loic Dachary
On 01/02/2015 14:47, Dennis Chen wrote: > Hello, > > If I write 2 different objects, eg, "john" and "paul" respectively to > a same pool like "testpool" in the cluster, is the primary OSD > calculated by CRUSH for the 2 objects the same? Hi, CRUSH is likely to place john on an OSD and paul on

Re: [ceph-users] ceph Performance random write is more then sequential

2015-02-01 Thread Florent MONTHEL
Hi Sumit Do you have cache pool tiering activated ? Some feed-back regarding your architecture ? Thanks Sent from my iPad > On 1 févr. 2015, at 15:50, Sumit Gaur wrote: > > Hi > I have installed 6 node ceph cluster and to my surprise when I ran rados > bench I saw that random write has more

[ceph-users] ceph Performance random write is more then sequential

2015-02-01 Thread Sumit Gaur
Hi I have installed 6 node ceph cluster and to my surprise when I ran rados bench I saw that random write has more performance number then sequential write. This is opposite to normal disk write. Can some body let me know if I am missing any ceph Architecture point here ? __

[ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Dennis Chen
Hello, If I write 2 different objects, eg, "john" and "paul" respectively to a same pool like "testpool" in the cluster, is the primary OSD calculated by CRUSH for the 2 objects the same? -- Den ___ ceph-users mailing list ceph-users@lists.ceph.com htt

Re: [ceph-users] Moving a Ceph cluster (to a new network)

2015-02-01 Thread François Petit
Hi Don, I reconfigured the Monitors network recently. My environment is ceph 0.80.7; Openstack Icehouse ; nova, glance, cinder using ceph RBD ; RHEL7.0 nodes. The first thing to do is to check that your new network config will allow communications between your MONs (I assume you have 3 mons),