date:20131021

[ceph-users] Ceph Block Device install

2013-10-21 Thread Fuchs, Andreas (SwissTXT)

Hi

I try to install a client with ceph block device following the instructions 
here:
http://ceph.com/docs/master/start/quick-rbd/

the client has a user ceph and ssh is setup passwordless also for sudo
when I run ceph-deploy I see:

On the ceph management host:

ceph-deploy install 10.100.21.10
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
ceph hosts 10.100.21.10
[ceph_deploy.install][DEBUG ] Detecting platform for host 10.100.21.10 ...
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
[ceph_deploy][ERROR ] ClientInitException:

On the target client in secure log:

Oct 21 11:18:52 archiveadmin sshd[22320]: Accepted publickey for ceph from 
10.100.220.110 port 47197 ssh2
Oct 21 11:18:52 archiveadmin sshd[22320]: pam_unix(sshd:session): session 
opened for user ceph by (uid=0)
Oct 21 11:18:52 archiveadmin sudo: ceph : TTY=unknown ; PWD=/home/ceph ; 
USER=root ; COMMAND=/usr/bin/python -u -c exec reduce(lambda a,b: a+b, map(chr,
Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
(105,109,112,111,114,116,32,95,95,98,117,105,108,116,105,110,95,95,44,32,111,115,44,32,109,97,114,115,104,97,108,44,32,115,121,115,10,116,114,121,58,10,32,32,32,32,105,109,112,111,114,116,32,104,97,115,104,108,105,98,10,101,120,99,101,112,116,32,73,109,112,111,114,116,69,114,114,111,114,58,10,32,32,32,32,105,109,112,111,114,116,32,109,100,53,32,97,115,32,104,97,115,104,108,105,98,10,10,35,32,66,97,99,107,32,117,112,32,111,108,100,32,115,116,100,105,110,47,115,116,100,111,117,116,46,10,115,116,100,111,117,116,32,61,32,111,115,46,102,100,111,112,101,110,40,111,115,46,100,117,112,40,115,121,115,46,115,116,100,111,117,116,46,102,105,108,101,110,111,40,41,41,44,32,34,119,98,34,44,32,48,41,10,115,116,100,105,110,32,61,32,111,115,46,102,100,111,112,101,110,40,111,115,46,100,117,112,40,115,121,115,46,115,116,100,105,110,46,102,105,108,101,110,111,40,41,41,44,32,34,114,98,34,44,32,48,41,10,116,114,121,58,10,32,32,32,32,
 105,1
Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
,115,118,99,114,116,10,32,32,32,32,109,115,118,99,114,116,46,115,101,116,109,111,100,101,40,115,116,100,111,117,116,46,102,105,108,101,110,111,40,41,44,32,111,115,46,79,95,66,73,78,65,82,89,41,10,32,32,32,32,109,115,118,99,114,116,46,115,101,116,109,111,100,101,40,115,116,100,105,110,46,102,105,108,101,110,111,40,41,44,32,111,115,46,79,95,66,73,78,65,82,89,41,10,101,120,99,101,112,116,32,73,109,112,111,114,116,69,114,114,111,114,58,32,112,97,115,115,10,115,121,115,46,115,116,100,111,117,116,46,99,108,111,115,101,40,41,10,115,121,115,46,115,116,100,105,110,46,99,108,111,115,101,40,41,10,10,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,32,61,32,49,57,57,52,54,10,115,101,114,118,101,114,83,111,117,114,99,101,32,61,32,115,116,100,105,110,46,114,101,97,100,40,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,41,10,119,104,105,108,101,32,108,101,110,40,115,101,114,118,101,114
 ,83,1
Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
0,32,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,58,10,32,32,32,32,115,101,114,118,101,114,83,111,117,114,99,101,32,43,61,32,115,116,100,105,110,46,114,101,97,100,40,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,32,45,32,108,101,110,40,115,101,114,118,101,114,83,111,117,114,99,101,41,41,10,10,116,114,121,58,10,32,32,32,32,97,115,115,101,114,116,32,104,97,115,104,108,105,98,46,109,100,53,40,115,101,114,118,101,114,83,111,117,114,99,101,41,46,100,105,103,101,115,116,40,41,32,61,61,32,39,95,92,120,57,57,81,92,120,49,50,33,126,92,120,57,48,92,120,98,101,92,120,102,98,92,120,99,51,92,120,56,97,92,120,56,101,89,33,92,120,101,100,92,120,97,55,39,10,32,32,32,32,95,95,98,117,105,108,116,105,110,95,95,46,112,117,115,104,121,95,115,111,117,114,99,101,32,61,32,115,101,114,118,101,114,83,111,117,114,99,101,10,32,32,32,32,115,101,114,118,101,114,67,111,100,101,32,61,32,109,97,1
 14,11
Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
7,100,115,40,115,101,114,118,101,114,83,111,117,114,99,101,41,10,32,32,32,32,101,120,101,99,32,115,101,114,118,101,114,67,111,100,101,10,32,32,32,32,112,117,115,104,121,95,115,101,114,118,101,114,40,115,116,100,105,110,44,32,115,116,100,111,117,116,41,10,101,120,99,101,112,116,58,10,32,32,32,32,105,109,112,111,114,116,32,116,114,97,99,101,98,97,99,107,10,32,32,32,32,35,32,85,110,99,111,109,109,101,110,116,32,102,111,114,32,100,101,98,117,103,103,105,110,103,10,32,32,32,32,35,32,116,114,97,99,101,98,97,99,107,46,112,114,105,110,116,95,101,120,99,40,102,105,108,101,61,111,112,101,110,40,34,115,116,100,101,114,114,46,116,120,116,34,44,32,34,119,34,41,41,10,32,32,32,32,114,97,105,115,101)))
Oct 21 11:18:52 archiveadmin sshd[22322]: Received disconnect from 
10.100.220.110: 11: disconnected by user
Oct 21 11:18:52 archiveadmin sshd[22320]: pam_unix(sshd:session): session 
closed for user ceph
Oct 21 11:22:24 archiveadmin sshd[2

Re: [ceph-users] Cuttlefish: pool recreation results in cluster crash

2013-10-21 Thread Joao Eduardo Luis


On 10/19/2013 08:53 PM, Andrey Korolyov wrote:

Hello,

I was able to reproduce following on the top of current cuttlefish:

- create pool,
- delete it after all pgs initialized,
- create new pool with same name after, say, ten seconds.

All osds dies immediately with attached trace. The problem exists in
bobtail as well.


Can we have the resulting backtrace and context log?

  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Device install

2013-10-21 Thread Alfredo Deza

On Mon, Oct 21, 2013 at 5:25 AM, Fuchs, Andreas (SwissTXT)
 wrote:
> Hi
>
> I try to install a client with ceph block device following the instructions 
> here:
> http://ceph.com/docs/master/start/quick-rbd/
>
> the client has a user ceph and ssh is setup passwordless also for sudo
> when I run ceph-deploy I see:
>
> On the ceph management host:
>
> ceph-deploy install 10.100.21.10
> [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
> ceph hosts 10.100.21.10
> [ceph_deploy.install][DEBUG ] Detecting platform for host 10.100.21.10 ...
> [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
> [ceph_deploy][ERROR ] ClientInitException:
>

Mmmn this doesn't look like the full log... it looks like its missing
the rest of the error? Unless that is where it stopped which would be
terrible.

What version of ceph-deploy are you using?

> On the target client in secure log:
>
> Oct 21 11:18:52 archiveadmin sshd[22320]: Accepted publickey for ceph from 
> 10.100.220.110 port 47197 ssh2
> Oct 21 11:18:52 archiveadmin sshd[22320]: pam_unix(sshd:session): session 
> opened for user ceph by (uid=0)
> Oct 21 11:18:52 archiveadmin sudo: ceph : TTY=unknown ; PWD=/home/ceph ; 
> USER=root ; COMMAND=/usr/bin/python -u -c exec reduce(lambda a,b: a+b, 
> map(chr,
> Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
> (105,109,112,111,114,116,32,95,95,98,117,105,108,116,105,110,95,95,44,32,111,115,44,32,109,97,114,115,104,97,108,44,32,115,121,115,10,116,114,121,58,10,32,32,32,32,105,109,112,111,114,116,32,104,97,115,104,108,105,98,10,101,120,99,101,112,116,32,73,109,112,111,114,116,69,114,114,111,114,58,10,32,32,32,32,105,109,112,111,114,116,32,109,100,53,32,97,115,32,104,97,115,104,108,105,98,10,10,35,32,66,97,99,107,32,117,112,32,111,108,100,32,115,116,100,105,110,47,115,116,100,111,117,116,46,10,115,116,100,111,117,116,32,61,32,111,115,46,102,100,111,112,101,110,40,111,115,46,100,117,112,40,115,121,115,46,115,116,100,111,117,116,46,102,105,108,101,110,111,40,41,41,44,32,34,119,98,34,44,32,48,41,10,115,116,100,105,110,32,61,32,111,115,46,102,100,111,112,101,110,40,111,115,46,100,117,112,40,115,121,115,46,115,116,100,105,110,46,102,105,108,101,110,111,40,41,41,44,32,34,114,98,34,44,32,48,41,10,116,114,121,58,10,32,32,32,3
 2,
>  105,1
> Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
> ,115,118,99,114,116,10,32,32,32,32,109,115,118,99,114,116,46,115,101,116,109,111,100,101,40,115,116,100,111,117,116,46,102,105,108,101,110,111,40,41,44,32,111,115,46,79,95,66,73,78,65,82,89,41,10,32,32,32,32,109,115,118,99,114,116,46,115,101,116,109,111,100,101,40,115,116,100,105,110,46,102,105,108,101,110,111,40,41,44,32,111,115,46,79,95,66,73,78,65,82,89,41,10,101,120,99,101,112,116,32,73,109,112,111,114,116,69,114,114,111,114,58,32,112,97,115,115,10,115,121,115,46,115,116,100,111,117,116,46,99,108,111,115,101,40,41,10,115,121,115,46,115,116,100,105,110,46,99,108,111,115,101,40,41,10,10,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,32,61,32,49,57,57,52,54,10,115,101,114,118,101,114,83,111,117,114,99,101,32,61,32,115,116,100,105,110,46,114,101,97,100,40,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,41,10,119,104,105,108,101,32,108,101,110,40,115,101,114,118,101,1
 14
>  ,83,1
> Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
> 0,32,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,58,10,32,32,32,32,115,101,114,118,101,114,83,111,117,114,99,101,32,43,61,32,115,116,100,105,110,46,114,101,97,100,40,115,101,114,118,101,114,83,111,117,114,99,101,76,101,110,103,116,104,32,45,32,108,101,110,40,115,101,114,118,101,114,83,111,117,114,99,101,41,41,10,10,116,114,121,58,10,32,32,32,32,97,115,115,101,114,116,32,104,97,115,104,108,105,98,46,109,100,53,40,115,101,114,118,101,114,83,111,117,114,99,101,41,46,100,105,103,101,115,116,40,41,32,61,61,32,39,95,92,120,57,57,81,92,120,49,50,33,126,92,120,57,48,92,120,98,101,92,120,102,98,92,120,99,51,92,120,56,97,92,120,56,101,89,33,92,120,101,100,92,120,97,55,39,10,32,32,32,32,95,95,98,117,105,108,116,105,110,95,95,46,112,117,115,104,121,95,115,111,117,114,99,101,32,61,32,115,101,114,118,101,114,83,111,117,114,99,101,10,32,32,32,32,115,101,114,118,101,114,67,111,100,101,32,61,32,109,97
 ,1
>  14,11
> Oct 21 11:18:52 archiveadmin sudo: ceph : (command continued) 
> 7,100,115,40,115,101,114,118,101,114,83,111,117,114,99,101,41,10,32,32,32,32,101,120,101,99,32,115,101,114,118,101,114,67,111,100,101,10,32,32,32,32,112,117,115,104,121,95,115,101,114,118,101,114,40,115,116,100,105,110,44,32,115,116,100,111,117,116,41,10,101,120,99,101,112,116,58,10,32,32,32,32,105,109,112,111,114,116,32,116,114,97,99,101,98,97,99,107,10,32,32,32,32,35,32,85,110,99,111,109,109,101,110,116,32,102,111,114,32,100,101,98,117,103,103,105,110,103,10,32,32,32,32,35,32,116,114,97,99,101,98,97,99,107,46,112,114,105,110,116,95,101,120,99,40,102,105,108,101,61,111,112,101,110,40,34,1

Re: [ceph-users] changing from default journals to external journals

2013-10-21 Thread Snider, Tim

Sage - 
The journal device needs a file system created does that device need to be 
mounted?
Tim

-Original Message-
From: Sage Weil [mailto:s...@inktank.com] 
Sent: Thursday, October 17, 2013 11:02 AM
To: Snider, Tim
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] changing from default journals to external journals

On Thu, 17 Oct 2013, Snider, Tim wrote:
> /var/lib/ceph/osd/ceph-NNN/journal is a "real" file on my system:
>  ls -l  /var/lib/ceph/osd/ceph-0/journal 
>  -rw-r--r-- 1 root root 1073741824 Oct 17 06:47 
> /var/lib/ceph/osd/ceph-0/journal
> 
> Any problems with my proposed added steps (3 - 5)?
> 
> 1.stop a ceph-osd daemon
> 2.ceph-osd --flush-journal -i NNN
> 3.(+) cp /var/lib/ceph/osd/ceph-NNN/journal 
> /var/lib/ceph/osd/ceph-NNN/journal.saved
> 4.(+) create a new empty journal file: 
>   touch /mount/sdzNNN/journal
> 5.(+) Create the symbolic link for Ceph:
>   ln -s /mount/sdzNNN/journal 
> var/lib/ceph/osd/ceph-NNN/journal

3. mv journal journal.old
4. ln -s /dev/whatever journal

(you're better off using a raw partition or other block device than a
file.)  or,

4. touch /new/path
5. ln -s /other/path journal

> 6.ceph-osd --mkjournal -i NNN
> 7.start ceph-osd
> 
> Tim
> -Original Message-
> From: Sage Weil [mailto:s...@inktank.com]
> Sent: Wednesday, October 16, 2013 5:02 PM
> To: Snider, Tim
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] changing from default journals to external 
> journals
> 
> On Wed, 16 Oct 2013, Snider, Tim wrote:
> > I configured my cluster using the default journal location for my 
> > osds. Can I migrate the default journals to explicit separate 
> > devices without a complete cluster teardown and reinstallation? How?
> 
> - stop a ceph-osd daemon, then
> - ceph-osd --flush-journal -i NNN
> - set/adjust the journal symlink at /var/lib/ceph/osd/ceph-NNN/journal to
>   point wherever you want
> - ceph-osd --mkjournal -i NNN
> - start ceph-osd
> 
> This won't set up the udev magic on the journal device, but that doesn't 
> really matter if you're not hotplugging devices.
> 
> sage
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Rados bench result when increasing OSDs

2013-10-21 Thread Guang Yang

Dear ceph-users,
Recently I deployed a ceph cluster with RadosGW, from a small one (24 OSDs) to 
a much bigger one (330 OSDs).

When using rados bench to test the small cluster (24 OSDs), it showed the 
average latency was around 3ms (object size is 5K), while for the larger one 
(330 OSDs), the average latency was around 7ms (object size 5K), twice 
comparing the small cluster.

The OSD within the two cluster have the same configuration, SAS disk,  and two 
partitions for one disk, one for journal and the other for metadata.

For PG numbers, the small cluster tested with the pool having 100 PGs, and for 
the large cluster, the pool has 4 PGs (as I will to further scale the 
cluster, so I choose a much large PG).

Does my test result make sense? Like when the PG number and OSD increase, the 
latency might drop?

Thanks,
Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Christoph Hellwig

On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> It looks like without LVM we're getting 128KB requests (which IIRC is 
> typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> fuzzy here, but I seem to recall a property on the request_queue or device 
> that affected this.  RBD is currently doing

Unfortunately most device mapper modules still split all I/O into 4k
chunks before handling them.  They rely on the elevator to merge them
back together down the line, which isn't overly efficient but should at
least provide larger segments for the common cases.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados bench result when increasing OSDs

2013-10-21 Thread Mark Nelson


On 10/21/2013 09:13 AM, Guang Yang wrote:

Dear ceph-users,


Hi!


Recently I deployed a ceph cluster with RadosGW, from a small one (24 OSDs) to 
a much bigger one (330 OSDs).

When using rados bench to test the small cluster (24 OSDs), it showed the 
average latency was around 3ms (object size is 5K), while for the larger one 
(330 OSDs), the average latency was around 7ms (object size 5K), twice 
comparing the small cluster.


Did you have the same number of concurrent requests going?



The OSD within the two cluster have the same configuration, SAS disk,  and two 
partitions for one disk, one for journal and the other for metadata.

For PG numbers, the small cluster tested with the pool having 100 PGs, and for 
the large cluster, the pool has 4 PGs (as I will to further scale the 
cluster, so I choose a much large PG).


Forgive me if this is a silly question, but were the pools using the 
same level of replication?




Does my test result make sense? Like when the PG number and OSD increase, the 
latency might drop?


You wouldn't necessarily expect a larger cluster to show higher latency 
if the nodes, pools, etc were all configured exactly the same, 
especially if you were using the same amount of concurrency.  It's 
possible that you have some slow drives on the larger cluster that could 
be causing the average latency to increase.  If there are more disks per 
node, that could do it too.


Are there any other differences you can think of?



Thanks,
Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Mike Snitzer

On Mon, Oct 21 2013 at 11:01am -0400,
Mike Snitzer  wrote:

> On Mon, Oct 21 2013 at 10:11am -0400,
> Christoph Hellwig  wrote:
> 
> > On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > > typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> > > fuzzy here, but I seem to recall a property on the request_queue or 
> > > device 
> > > that affected this.  RBD is currently doing
> > 
> > Unfortunately most device mapper modules still split all I/O into 4k
> > chunks before handling them.  They rely on the elevator to merge them
> > back together down the line, which isn't overly efficient but should at
> > least provide larger segments for the common cases.
> 
> It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> no?  Unless care is taken to assemble larger bios (higher up the IO
> stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
> in $PAGE_SIZE granularity.
> 
> I would expect direct IO to before better here because it will make use
> of bio_add_page to build up larger IOs.

s/before/perform/ ;)
 
> Taking a step back, the rbd driver is exposing both the minimum_io_size
> and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
> the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
> to respect the limits when it assembles its bios (via bio_add_page).
> 
> Sage, any reason why you don't use traditional raid geomtry based IO
> limits?, e.g.:
> 
> minimum_io_size = raid chunk size
> optimal_io_size = raid chunk size * N stripes (aka full stripe)
> 
> ___
> linux-lvm mailing list
> linux-...@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Mike Snitzer

On Mon, Oct 21 2013 at 10:11am -0400,
Christoph Hellwig  wrote:

> On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> > fuzzy here, but I seem to recall a property on the request_queue or device 
> > that affected this.  RBD is currently doing
> 
> Unfortunately most device mapper modules still split all I/O into 4k
> chunks before handling them.  They rely on the elevator to merge them
> back together down the line, which isn't overly efficient but should at
> least provide larger segments for the common cases.

It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
no?  Unless care is taken to assemble larger bios (higher up the IO
stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
in $PAGE_SIZE granularity.

I would expect direct IO to before better here because it will make use
of bio_add_page to build up larger IOs.

Taking a step back, the rbd driver is exposing both the minimum_io_size
and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
to respect the limits when it assembles its bios (via bio_add_page).

Sage, any reason why you don't use traditional raid geomtry based IO
limits?, e.g.:

minimum_io_size = raid chunk size
optimal_io_size = raid chunk size * N stripes (aka full stripe)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] changing from default journals to external journals

2013-10-21 Thread Kurt Bauer

Hi,

neither do you need a filesystem on the partition, nor does it have to
be mounted. You can link the journal against the raw partition.

Best regards,
Kurt

Snider, Tim schrieb:
> Sage - 
> The journal device needs a file system created does that device need to be 
> mounted?
> Tim
>
> -Original Message-
> From: Sage Weil [mailto:s...@inktank.com] 
> Sent: Thursday, October 17, 2013 11:02 AM
> To: Snider, Tim
> Cc: ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] changing from default journals to external journals
>
> On Thu, 17 Oct 2013, Snider, Tim wrote:
>> /var/lib/ceph/osd/ceph-NNN/journal is a "real" file on my system:
>>  ls -l  /var/lib/ceph/osd/ceph-0/journal 
>>  -rw-r--r-- 1 root root 1073741824 Oct 17 06:47 
>> /var/lib/ceph/osd/ceph-0/journal
>>
>> Any problems with my proposed added steps (3 - 5)?
>>
>> 1.   stop a ceph-osd daemon
>> 2.   ceph-osd --flush-journal -i NNN
>> 3.   (+) cp /var/lib/ceph/osd/ceph-NNN/journal 
>> /var/lib/ceph/osd/ceph-NNN/journal.saved
>> 4.   (+) create a new empty journal file: 
>>  touch /mount/sdzNNN/journal
>> 5.   (+) Create the symbolic link for Ceph:
>>  ln -s /mount/sdzNNN/journal 
>> var/lib/ceph/osd/ceph-NNN/journal
>
> 3. mv journal journal.old
> 4. ln -s /dev/whatever journal
>
> (you're better off using a raw partition or other block device than a
> file.)  or,
>
> 4. touch /new/path
> 5. ln -s /other/path journal
>
>> 6.   ceph-osd --mkjournal -i NNN
>> 7.   start ceph-osd
>>
>> Tim
>> -Original Message-
>> From: Sage Weil [mailto:s...@inktank.com]
>> Sent: Wednesday, October 16, 2013 5:02 PM
>> To: Snider, Tim
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] changing from default journals to external 
>> journals
>>
>> On Wed, 16 Oct 2013, Snider, Tim wrote:
>>> I configured my cluster using the default journal location for my 
>>> osds. Can I migrate the default journals to explicit separate 
>>> devices without a complete cluster teardown and reinstallation? How?
>> - stop a ceph-osd daemon, then
>> - ceph-osd --flush-journal -i NNN
>> - set/adjust the journal symlink at /var/lib/ceph/osd/ceph-NNN/journal to
>>   point wherever you want
>> - ceph-osd --mkjournal -i NNN
>> - start ceph-osd
>>
>> This won't set up the udev magic on the journal device, but that doesn't 
>> really matter if you're not hotplugging devices.
>>
>> sage
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Intermittent poor performance on 3 node cluster

2013-10-21 Thread Pieter Steyn


Hi all,

I'm using Ceph as a filestore for my nginx web server, in order to have 
shared storage, and redundancy with automatic failover.


The cluster is not high spec, but given my use case (lots of images) - I 
am very dissapointed with the current throughput I'm getting, and was 
hoping for some advice.


I'm using CephFS and the latest Dumpling version on Ubuntu Server 12.04

Server specs:

CephFS1, CephFS2:

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
12GB Ram
1x 2TB SATA XFS
1x 2TB SATA (For the journal)

Each server runs 1x OSD, 1x MON and 1x MDS.
A third server runs 1x MON for Paxos to work correctly.
All machines are connected via a gigabit switch.

The ceph config as follows:

[global]
fsid = 58b87152-5ce8-491e-ae9c-07caeea3fefb
mon_initial_members = lb1, cephfs1, cephfs2
mon_host = 192.168.1.58,192.168.1.70,192.168.1.72
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

Osd dump:

epoch 750
fsid 58b87152-5ce8-491e-ae9c-07caeea3fefb
created 2013-09-12 13:13:02.695411
modified 2013-10-21 14:28:31.780838
flags

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0


max_osd 4
osd.0 up   in  weight 1 up_from 741 up_thru 748 down_at 739 
last_clean_interval [614,738) 192.168.1.70:6802/12325 
192.168.1.70:6803/12325 192.168.1.70:6804/12325 192.168.1.70:6805/12325 
exists,up d59119d5-bccb-43ea-be64-9d2272605617
osd.1 up   in  weight 1 up_from 748 up_thru 748 down_at 745 
last_clean_interval [20,744) 192.168.1.72:6800/4271 
192.168.1.72:6801/4271 192.168.1.72:6802/4271 192.168.1.72:6803/4271 
exists,up 930c097a-f68b-4f9c-a6a1-6787a1382a41


pg_temp 0.12 [1,0,3]
pg_temp 0.16 [1,0,3]
pg_temp 0.18 [1,0,3]
pg_temp 1.11 [1,0,3]
pg_temp 1.15 [1,0,3]
pg_temp 1.17 [1,0,3]

Slowdowns increase the load of my nginx servers to around 40, and access 
to the CephFS mount is incredibly slow.  These slowdowns happen about 
once a week.  I typically solve them by restarting the MDS.


When the cluster gets slow I see the following in my logs:

2013-10-21 14:33:54.079200 7f6301e10700  0 log [WRN] : slow request 
30.281651 seconds old, received at 2013-10-21 14:33:23.797488: 
osd_op(mds.0.8:16266 14094c4. [tmapup 0~0] 1.91102783 e750) 
v4 currently commit sent
013-10-21 14:33:54.079191 7f6301e10700  0 log [WRN] : 6 slow requests, 6 
included below; oldest blocked for > 30.281651 secs


Any advice? Would increasing the PG num for data and metadata help? 
Would moving the MDS to a host which does not also run an OSD be greatly 
beneficial?


Please let me know if you need more info.

Thank you,
Pieter
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] changing from default journals to external journals

2013-10-21 Thread Sage Weil

On Mon, 21 Oct 2013, Snider, Tim wrote:
> Sage - 
> The journal device needs a file system created does that device need to be 
> mounted?

Yes.. the mkjournal step needs to writes to the journal (wehther it's a 
file or block device).

sage


> Tim
> 
> -Original Message-
> From: Sage Weil [mailto:s...@inktank.com] 
> Sent: Thursday, October 17, 2013 11:02 AM
> To: Snider, Tim
> Cc: ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] changing from default journals to external journals
> 
> On Thu, 17 Oct 2013, Snider, Tim wrote:
> > /var/lib/ceph/osd/ceph-NNN/journal is a "real" file on my system:
> >  ls -l  /var/lib/ceph/osd/ceph-0/journal 
> >  -rw-r--r-- 1 root root 1073741824 Oct 17 06:47 
> > /var/lib/ceph/osd/ceph-0/journal
> > 
> > Any problems with my proposed added steps (3 - 5)?
> > 
> > 1.  stop a ceph-osd daemon
> > 2.  ceph-osd --flush-journal -i NNN
> > 3.  (+) cp /var/lib/ceph/osd/ceph-NNN/journal 
> > /var/lib/ceph/osd/ceph-NNN/journal.saved
> > 4.  (+) create a new empty journal file: 
> > touch /mount/sdzNNN/journal
> > 5.  (+) Create the symbolic link for Ceph:
> > ln -s /mount/sdzNNN/journal 
> > var/lib/ceph/osd/ceph-NNN/journal
> 
> 3. mv journal journal.old
> 4. ln -s /dev/whatever journal
> 
> (you're better off using a raw partition or other block device than a
> file.)  or,
> 
> 4. touch /new/path
> 5. ln -s /other/path journal
> 
> > 6.  ceph-osd --mkjournal -i NNN
> > 7.  start ceph-osd
> > 
> > Tim
> > -Original Message-
> > From: Sage Weil [mailto:s...@inktank.com]
> > Sent: Wednesday, October 16, 2013 5:02 PM
> > To: Snider, Tim
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] changing from default journals to external 
> > journals
> > 
> > On Wed, 16 Oct 2013, Snider, Tim wrote:
> > > I configured my cluster using the default journal location for my 
> > > osds. Can I migrate the default journals to explicit separate 
> > > devices without a complete cluster teardown and reinstallation? How?
> > 
> > - stop a ceph-osd daemon, then
> > - ceph-osd --flush-journal -i NNN
> > - set/adjust the journal symlink at /var/lib/ceph/osd/ceph-NNN/journal to
> >   point wherever you want
> > - ceph-osd --mkjournal -i NNN
> > - start ceph-osd
> > 
> > This won't set up the udev magic on the journal device, but that doesn't 
> > really matter if you're not hotplugging devices.
> > 
> > sage
> > 
> > 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] changing from default journals to external journals

2013-10-21 Thread Snider, Tim

Your reply seems to contradict the reply from Sage:

> Sage -

> The journal device needs a file system created does that device need to 
be mounted?

Yes.. the mkjournal step needs to writes to the journal (wehther it's a 
file or block device).
Tim

From: Kurt Bauer [mailto:kurt.ba...@univie.ac.at]
Sent: Monday, October 21, 2013 10:16 AM
To: Snider, Tim
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] changing from default journals to external journals

Hi,

neither do you need a filesystem on the partition, nor does it have to be 
mounted. You can link the journal against the raw partition.

Best regards,
Kurt

Snider, Tim schrieb:

Sage -

The journal device needs a file system created does that device need to be 
mounted?

Tim

-Original Message-

From: Sage Weil [mailto:s...@inktank.com]

Sent: Thursday, October 17, 2013 11:02 AM

To: Snider, Tim

Cc: ceph-users@lists.ceph.com

Subject: RE: [ceph-users] changing from default journals to external journals

On Thu, 17 Oct 2013, Snider, Tim wrote:

/var/lib/ceph/osd/ceph-NNN/journal is a "real" file on my system:

 ls -l  /var/lib/ceph/osd/ceph-0/journal

 -rw-r--r-- 1 root root 1073741824 Oct 17 06:47

/var/lib/ceph/osd/ceph-0/journal

Any problems with my proposed added steps (3 - 5)?

1.   stop a ceph-osd daemon

2.   ceph-osd --flush-journal -i NNN

3.   (+) cp /var/lib/ceph/osd/ceph-NNN/journal 
/var/lib/ceph/osd/ceph-NNN/journal.saved

4.   (+) create a new empty journal file:

 touch /mount/sdzNNN/journal

5.   (+) Create the symbolic link for Ceph:

 ln -s /mount/sdzNNN/journal var/lib/ceph/osd/ceph-NNN/journal

3. mv journal journal.old

4. ln -s /dev/whatever journal

(you're better off using a raw partition or other block device than a

file.)  or,

4. touch /new/path

5. ln -s /other/path journal

6.   ceph-osd --mkjournal -i NNN

7.   start ceph-osd

Tim

-Original Message-

From: Sage Weil [mailto:s...@inktank.com]

Sent: Wednesday, October 16, 2013 5:02 PM

To: Snider, Tim

Cc: ceph-users@lists.ceph.com

Subject: Re: [ceph-users] changing from default journals to external

journals

On Wed, 16 Oct 2013, Snider, Tim wrote:

I configured my cluster using the default journal location for my

osds. Can I migrate the default journals to explicit separate

devices without a complete cluster teardown and reinstallation? How?

- stop a ceph-osd daemon, then

- ceph-osd --flush-journal -i NNN

- set/adjust the journal symlink at /var/lib/ceph/osd/ceph-NNN/journal to

  point wherever you want

- ceph-osd --mkjournal -i NNN

- start ceph-osd

This won't set up the udev magic on the journal device, but that doesn't really 
matter if you're not hotplugging devices.

sage

___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Boot from volume with Dumpling on RDO/CentOS 6 (using backported QEMU 0.12)

2013-10-21 Thread Andrew Richards

Hi Everybody,

I'm attempting to get Ceph working for CentOS 6.4 running RDO Havana for Cinder 
volume storage and boot-from-volume, and I keep bumping into a very unhelpful 
errors on my nova-compute test node and my cinder controller node.

Here is what I see on my cinder-volume controller (Node #1) when I try to 
attach a RBD-backed Cinder volume to a Nova VM using either the GUI or nova 
volume-attach (/var/log/cinder/volume.log):

2013-10-20 18:21:05.880 13668 ERROR cinder.openstack.common.rpc.amqp 
[req-bd62cb07-42e7-414a-86dc-f26f7a569de6 9bfee22cd15b4dc0a2e203d7c151edbc 
8431635821f84285afdd0f5faf1ce1aa] Exception during message handling
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp Traceback 
(most recent call last):
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 
441, in _process_data
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp **args)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", 
line 148, in dispatch
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp return 
getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/utils.py", line 808, in wrapper
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp return 
func(self, *args, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 624, in 
initialize_connection
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp 
conn_info = self.driver.initialize_connection(volume, connector)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py", line 665, in 
initialize_connection
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp hosts, 
ports = self._get_mon_addrs()
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py", line 312, in 
_get_mon_addrs
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp out, _ 
= self._execute(*args)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in execute
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp return 
processutils.execute(*cmd, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py", 
line 158, in execute
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp 
shell=shell)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib/python2.6/site-packages/eventlet/green/subprocess.py", line 25, in 
__init__
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp 
subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib64/python2.6/subprocess.py", line 642, in __init__
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp 
errread, errwrite)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp   File 
"/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp raise 
child_exception
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp OSError: 
[Errno 2] No such file or directory
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp 
2013-10-20 18:21:05.883 13668 ERROR cinder.openstack.common.rpc.common 
[req-bd62cb07-42e7-414a-86dc-f26f7a569de6 9bfee22cd15b4dc0a2e203d7c151edbc 
8431635821f84285afdd0f5faf1ce1aa] Returning exception [Errno 2] No such file or 
directory to caller


Here is what I see on my nova-compute node (Node #2) when I try to boot from 
volume (/var/log/nova/compute.log):

ERROR nova.compute.manager [req-ced59268-4766-4f57-9cdb-4ba451b0faaa 
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa] [instance: 
c80a053f-b84c-401c-8e29-022d4c6f56a0] Error: The server has either erred or is 
incapable of performing the requested operation. (HTTP 500) (Request-ID: 
req-44557bfa-6777-41a6-8183-e08dedf0611b)
2013-10-17 15:01:45.060 18546 TRACE nova.compute.manager [instance: 
c80a053f-b84c-401c-8e29-022d4c6f56a0] Traceback (most recent call last):
2013-10-17 15:01:45.060 18546 TRACE nova.compute.manager [instance: 
c80a053f-b84c-401c-8e29-022d4c6f56a0]   File 
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1028, in 
_build_instance
2

Re: [ceph-users] changing from default journals to external journals

2013-10-21 Thread Sage Weil

On Mon, 21 Oct 2013, Snider, Tim wrote:
> 
> Your reply seems to contradict the reply from Sage:
> 
>     > Sage -
> 
>     > The journal device needs a file system created does that device need
> to be mounted?
> 
>     Yes.. the mkjournal step needs to writes to the journal (wehther
> it's a file or block device).

We just need to read/write whatever the journal symlink points to. Ideally 
that is a raw partition/block device.  If it's a file, the containing file 
system needs to be mounted so that the file is accesible.

As Kurt suggests, though, a raw partition is preferable!

sage


> 
> Tim
> 
>  
> 
> From: Kurt Bauer [mailto:kurt.ba...@univie.ac.at]
> Sent: Monday, October 21, 2013 10:16 AM
> To: Snider, Tim
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] changing from default journals to external
> journals
> 
>  
> 
> Hi,
> 
> neither do you need a filesystem on the partition, nor does it have to be
> mounted. You can link the journal against the raw partition.
> 
> Best regards,
> Kurt
> 
> Snider, Tim schrieb:
> 
> Sage - 
> 
> The journal device needs a file system created does that device need to be m
> ounted?
> 
> Tim
> 
>  
> 
> -Original Message-
> 
> From: Sage Weil [mailto:s...@inktank.com] 
> 
> Sent: Thursday, October 17, 2013 11:02 AM
> 
> To: Snider, Tim
> 
> Cc: ceph-users@lists.ceph.com
> 
> Subject: RE: [ceph-users] changing from default journals to external journal
> s
> 
>  
> 
> On Thu, 17 Oct 2013, Snider, Tim wrote:
> 
> /var/lib/ceph/osd/ceph-NNN/journal is a "real" file on my system:
> 
>  ls -l  /var/lib/ceph/osd/ceph-0/journal 
> 
>  -rw-r--r-- 1 root root 1073741824 Oct 17 06:47 
> 
> /var/lib/ceph/osd/ceph-0/journal
> 
>  
> 
> Any problems with my proposed added steps (3 - 5)?
> 
>  
> 
> 1.   stop a ceph-osd daemon
> 
> 2.   ceph-osd --flush-journal -i NNN
> 
> 3.   (+) cp /var/lib/ceph/osd/ceph-NNN/journal /var/lib/ceph/osd/ceph-NN
> N/journal.saved
> 
> 4.   (+) create a new empty journal file: 
> 
>  touch /mount/sdzNNN/journal
> 
> 5.   (+) Create the symbolic link for Ceph:
> 
>  ln -s /mount/sdzNNN/journal var/lib/ceph/osd/ceph-NNN/journal
> 
>  
> 
> 3. mv journal journal.old
> 
> 4. ln -s /dev/whatever journal
> 
>  
> 
> (you're better off using a raw partition or other block device than a
> 
> file.)  or,
> 
>  
> 
> 4. touch /new/path
> 
> 5. ln -s /other/path journal
> 
>  
> 
> 6.   ceph-osd --mkjournal -i NNN
> 
> 7.   start ceph-osd
> 
>  
> 
> Tim
> 
> -Original Message-
> 
> From: Sage Weil [mailto:s...@inktank.com]
> 
> Sent: Wednesday, October 16, 2013 5:02 PM
> 
> To: Snider, Tim
> 
> Cc: ceph-users@lists.ceph.com
> 
> Subject: Re: [ceph-users] changing from default journals to external 
> 
> journals
> 
>  
> 
> On Wed, 16 Oct 2013, Snider, Tim wrote:
> 
> I configured my cluster using the default journal location for my 
> 
> osds. Can I migrate the default journals to explicit separate 
> 
> devices without a complete cluster teardown and reinstallation? How?
> 
> - stop a ceph-osd daemon, then
> 
> - ceph-osd --flush-journal -i NNN
> 
> - set/adjust the journal symlink at /var/lib/ceph/osd/ceph-NNN/journal to
> 
>   point wherever you want
> 
> - ceph-osd --mkjournal -i NNN
> 
> - start ceph-osd
> 
>  
> 
> This won't set up the udev magic on the journal device, but that doesn't rea
> lly matter if you're not hotplugging devices.
> 
>  
> 
> sage
> 
>  
> 
>  
> 
> ___
> 
> ceph-users mailing list
> 
> ceph-users@lists.ceph.com
> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Sage Weil

On Mon, 21 Oct 2013, Mike Snitzer wrote:
> On Mon, Oct 21 2013 at 10:11am -0400,
> Christoph Hellwig  wrote:
> 
> > On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > > typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> > > fuzzy here, but I seem to recall a property on the request_queue or 
> > > device 
> > > that affected this.  RBD is currently doing
> > 
> > Unfortunately most device mapper modules still split all I/O into 4k
> > chunks before handling them.  They rely on the elevator to merge them
> > back together down the line, which isn't overly efficient but should at
> > least provide larger segments for the common cases.
> 
> It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> no?  Unless care is taken to assemble larger bios (higher up the IO
> stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
> in $PAGE_SIZE granularity.
> 
> I would expect direct IO to before better here because it will make use
> of bio_add_page to build up larger IOs.

I do know that we regularly see 128 KB requests when we put XFS (or 
whatever else) directly on top of /dev/rbd*.

> Taking a step back, the rbd driver is exposing both the minimum_io_size
> and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
> the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
> to respect the limits when it assembles its bios (via bio_add_page).
> 
> Sage, any reason why you don't use traditional raid geomtry based IO
> limits?, e.g.:
> 
> minimum_io_size = raid chunk size
> optimal_io_size = raid chunk size * N stripes (aka full stripe)

We are... by default we stripe 4M chunks across 4M objects.  You're 
suggesting it would actually help to advertise a smaller minimim_io_size 
(say, 1MB)?  This could easily be made tunable.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Boot from volume with Dumpling on RDO/CentOS 6 (using backported QEMU 0.12)

2013-10-21 Thread Josh Durgin


On 10/21/2013 09:03 AM, Andrew Richards wrote:

Hi Everybody,

I'm attempting to get Ceph working for CentOS 6.4 running RDO Havana for
Cinder volume storage and boot-from-volume, and I keep bumping into a
very unhelpful errors on my nova-compute test node and my cinder
controller node.

Here is what I see on my cinder-volume controller (Node #1) when I try
to attach a RBD-backed Cinder volume to a Nova VM using either the GUI
or nova volume-attach (/var/log/cinder/volume.log):

2013-10-20 18:21:05.880 13668 ERROR cinder.openstack.common.rpc.amqp
[req-bd62cb07-42e7-414a-86dc-f26f7a569de6
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
Exception during message handling
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
Traceback (most recent call last):
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File
"/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py",
line 441, in _process_data
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
**args)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File
"/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py",
line 148, in dispatch
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
return getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 808, in
wrapper
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
return func(self, *args, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line
624, in initialize_connection
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
conn_info = self.driver.initialize_connection(volume, connector)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
line 665, in initialize_connection
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
hosts, ports = self._get_mon_addrs()
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
line 312, in _get_mon_addrs
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
out, _ = self._execute(*args)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in
execute
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
return processutils.execute(*cmd, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File
"/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py",
line 158, in execute
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
shell=shell)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/eventlet/green/subprocess.py",
line 25, in __init__
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
errread, errwrite)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
raise child_exception
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
OSError: [Errno 2] No such file or directory
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
2013-10-20 18:21:05.883 13668 ERROR cinder.openstack.common.rpc.common
[req-bd62cb07-42e7-414a-86dc-f26f7a569de6
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
Returning exception [Errno 2] No such file or directory to caller


Here is what I see on my nova-compute node (Node #2) when I try to boot
from volume (/var/log/nova/compute.log):

ERROR nova.compute.manager [req-ced59268-4766-4f57-9cdb-4ba451b0faaa
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
[instance: c80a053f-b84c-401c-8e29-022d4c6f56a0] Error: The server has
either erred or is incapable of performing the requested operation.
(HTTP 500) (Request-ID: req-44557bfa-6777-41a6-8183-e08dedf0611b)
2013-10-17 15:01:45.060 18546 TRACE nova.compute.manager [instance:
c80a053f-b84c-401c-8e29-022d4c6f56a0] Traceback (most recent call last):
2013-10-17 15:01:45.060 18546 TRACE nova.compute.manager [instance:
c80a053f-b84c-401c-8e29-022d4c6f56a0]   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1028,
in _build_instance
2013-10-17 15:01:45.060 18546 TRACE nova.compute.manager [instance:

[ceph-users] unsubscribe

2013-10-21 Thread Art M.

unsubscribe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados bench result when increasing OSDs

2013-10-21 Thread Gregory Farnum

On Mon, Oct 21, 2013 at 7:13 AM, Guang Yang  wrote:
> Dear ceph-users,
> Recently I deployed a ceph cluster with RadosGW, from a small one (24 OSDs) 
> to a much bigger one (330 OSDs).
>
> When using rados bench to test the small cluster (24 OSDs), it showed the 
> average latency was around 3ms (object size is 5K), while for the larger one 
> (330 OSDs), the average latency was around 7ms (object size 5K), twice 
> comparing the small cluster.
>
> The OSD within the two cluster have the same configuration, SAS disk,  and 
> two partitions for one disk, one for journal and the other for metadata.
>
> For PG numbers, the small cluster tested with the pool having 100 PGs, and 
> for the large cluster, the pool has 4 PGs (as I will to further scale the 
> cluster, so I choose a much large PG).
>
> Does my test result make sense? Like when the PG number and OSD increase, the 
> latency might drop?

Besides what Mark said, can you describe your test in a little more
detail? Writing/reading, length of time, number of objects, etc.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams

Hello all,

Similar to this post from last month, I am experiencing 2 nodes that are 
constantly crashing upon start up: 
http://www.spinics.net/lists/ceph-users/msg04589.html

Here are the logs from the 2 without the debug commands, here: 
http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h

I have run the osds with the debug statements per the email, but I'm unsure 
where to post them, they are 108M each without compression. Should I create a 
bug on the tracker?

Thanks,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD journal size

2013-10-21 Thread Shain Miley

Hi,

We have been testing a ceph cluster with the following specs:

3 Mon's
72 OSD's spread across 6 Dell R-720xd servers
4 TB SAS drives
4 bonded 10 GigE NIC ports per server
64 GB of RAM

Up until this point we have been running tests using the default journal size 
of '1024'.
Before we start to place production data on the cluster I was want to clear up 
the following questions I have:

1)Is there a more appropriate journal size for my setup given the specs listed 
above?

2)According to this link:

http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11

CERN is using  '/dev/disk/by-path' for their OSD's.

Does ceph-deploy currently support setting up OSD's using this method?

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Boot from volume with Dumpling on RDO/CentOS 6 (using backported QEMU 0.12)

2013-10-21 Thread Andrew Richards

Thanks for the response Josh!

If the Ceph CLI tool still needs to be there for Cinder in Havana, then am I 
correct in assuming that I still also need to export "CEPH_ARGS='--id volumes'" 
in my cinder init script for the sake of cephx like I had to do in Grizzly?

Thanks,
Andy

On Oct 21, 2013, at 12:26 PM, Josh Durgin  wrote:

> On 10/21/2013 09:03 AM, Andrew Richards wrote:
>> Hi Everybody,
>> 
>> I'm attempting to get Ceph working for CentOS 6.4 running RDO Havana for
>> Cinder volume storage and boot-from-volume, and I keep bumping into a
>> very unhelpful errors on my nova-compute test node and my cinder
>> controller node.
>> 
>> Here is what I see on my cinder-volume controller (Node #1) when I try
>> to attach a RBD-backed Cinder volume to a Nova VM using either the GUI
>> or nova volume-attach (/var/log/cinder/volume.log):
>> 
>> 2013-10-20 18:21:05.880 13668 ERROR cinder.openstack.common.rpc.amqp
>> [req-bd62cb07-42e7-414a-86dc-f26f7a569de6
>> 9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
>> Exception during message handling
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> Traceback (most recent call last):
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File
>> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py",
>> line 441, in _process_data
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> **args)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File
>> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py",
>> line 148, in dispatch
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> return getattr(proxyobj, method)(ctxt, **kwargs)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 808, in
>> wrapper
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> return func(self, *args, **kwargs)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line
>> 624, in initialize_connection
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> conn_info = self.driver.initialize_connection(volume, connector)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
>> line 665, in initialize_connection
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> hosts, ports = self._get_mon_addrs()
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
>> line 312, in _get_mon_addrs
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> out, _ = self._execute(*args)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in
>> execute
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> return processutils.execute(*cmd, **kwargs)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File
>> "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py",
>> line 158, in execute
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> shell=shell)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib/python2.6/site-packages/eventlet/green/subprocess.py",
>> line 25, in __init__
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> errread, errwrite)
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> raise child_exception
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> OSError: [Errno 2] No such file or directory
>> 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
>> 2013-10-20 18:21:05.883 13668 ERROR cinder.openstack.common.rpc.common
>> [req-bd62cb07-42e7-414a-86dc-f26f7a569de6
>> 9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
>> Returning exception [Errno 2] No such file or directory to caller
>> 
>> 
>> Here is what I see on my nova-compute node (Node #2) when I try to boot
>> from volume (/var/log/nova/compute.log):
>> 
>> ERROR nova.compute.manager [req-ced59268-4766-4f57-9cdb-4ba451b0faaa
>> 9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
>> [instance: c80a053f-

Re: [ceph-users] Boot from volume with Dumpling on RDO/CentOS 6 (using backported QEMU 0.12)

2013-10-21 Thread Josh Durgin


On 10/21/2013 10:35 AM, Andrew Richards wrote:

Thanks for the response Josh!

If the Ceph CLI tool still needs to be there for Cinder in Havana, then
am I correct in assuming that I still also need to export
"CEPH_ARGS='--id volumes'" in my cinder init script for the sake of
cephx like I had to do in Grizzly?


No, that's no longer necessary.

Josh


Thanks,
Andy

On Oct 21, 2013, at 12:26 PM, Josh Durgin mailto:josh.dur...@inktank.com>> wrote:


On 10/21/2013 09:03 AM, Andrew Richards wrote:

Hi Everybody,

I'm attempting to get Ceph working for CentOS 6.4 running RDO Havana for
Cinder volume storage and boot-from-volume, and I keep bumping into a
very unhelpful errors on my nova-compute test node and my cinder
controller node.

Here is what I see on my cinder-volume controller (Node #1) when I try
to attach a RBD-backed Cinder volume to a Nova VM using either the GUI
or nova volume-attach (/var/log/cinder/volume.log):

2013-10-20 18:21:05.880 13668 ERROR cinder.openstack.common.rpc.amqp
[req-bd62cb07-42e7-414a-86dc-f26f7a569de6
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
Exception during message handling
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
Traceback (most recent call last):
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File
"/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py",
line 441, in _process_data
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
**args)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File
"/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py",
line 148, in dispatch
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
return getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 808, in
wrapper
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
return func(self, *args, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line
624, in initialize_connection
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
conn_info = self.driver.initialize_connection(volume, connector)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
line 665, in initialize_connection
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
hosts, ports = self._get_mon_addrs()
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
line 312, in _get_mon_addrs
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
out, _ = self._execute(*args)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in
execute
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
return processutils.execute(*cmd, **kwargs)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File
"/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py",
line 158, in execute
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
shell=shell)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib/python2.6/site-packages/eventlet/green/subprocess.py",
line 25, in __init__
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
errread, errwrite)
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
raise child_exception
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
OSError: [Errno 2] No such file or directory
2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
2013-10-20 18:21:05.883 13668 ERROR cinder.openstack.common.rpc.common
[req-bd62cb07-42e7-414a-86dc-f26f7a569de6
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
Returning exception [Errno 2] No such file or directory to caller


Here is what I see on my nova-compute node (Node #2) when I try to boot
from volume (/var/log/nova/compute.log):

ERROR nova.compute.manager [req-ced59268-4766-4f57-9cdb-4ba451b0faaa
9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
[instance: c80a053f-b84c-401c-8e29-022d4c6f56a0] Error: The server has
either erred or is incapable of performing the requested operation.
(HTTP 500) (Request-ID: r

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Mike Snitzer

On Mon, Oct 21 2013 at 12:02pm -0400,
Sage Weil  wrote:

> On Mon, 21 Oct 2013, Mike Snitzer wrote:
> > On Mon, Oct 21 2013 at 10:11am -0400,
> > Christoph Hellwig  wrote:
> > 
> > > On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > > > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > > > typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> > > > fuzzy here, but I seem to recall a property on the request_queue or 
> > > > device 
> > > > that affected this.  RBD is currently doing
> > > 
> > > Unfortunately most device mapper modules still split all I/O into 4k
> > > chunks before handling them.  They rely on the elevator to merge them
> > > back together down the line, which isn't overly efficient but should at
> > > least provide larger segments for the common cases.
> > 
> > It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> > no?  Unless care is taken to assemble larger bios (higher up the IO
> > stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
> > in $PAGE_SIZE granularity.
> > 
> > I would expect direct IO to before better here because it will make use
> > of bio_add_page to build up larger IOs.
> 
> I do know that we regularly see 128 KB requests when we put XFS (or 
> whatever else) directly on top of /dev/rbd*.

Should be pretty straight-forward to identify any limits that are
different by walking sysfs/queue, e.g.:

grep -r . /sys/block/rdbXXX/queue
vs
grep -r . /sys/block/dm-X/queue

Could be there is an unexpected difference.  For instance, there was
this fix recently: http://patchwork.usersys.redhat.com/patch/69661/

> > Taking a step back, the rbd driver is exposing both the minimum_io_size
> > and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
> > the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
> > to respect the limits when it assembles its bios (via bio_add_page).
> > 
> > Sage, any reason why you don't use traditional raid geomtry based IO
> > limits?, e.g.:
> > 
> > minimum_io_size = raid chunk size
> > optimal_io_size = raid chunk size * N stripes (aka full stripe)
> 
> We are... by default we stripe 4M chunks across 4M objects.  You're 
> suggesting it would actually help to advertise a smaller minimim_io_size 
> (say, 1MB)?  This could easily be made tunable.

You're striping 4MB chunks across 4 million stripes?

So the full stripe size in bytes is 17592186044416 (or 16TB)?  Yeah
cannot see how XFS could make use of that ;)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD journal size

2013-10-21 Thread Alfredo Deza

On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley  wrote:
> Hi,
>
> We have been testing a ceph cluster with the following specs:
>
> 3 Mon's
> 72 OSD's spread across 6 Dell R-720xd servers
> 4 TB SAS drives
> 4 bonded 10 GigE NIC ports per server
> 64 GB of RAM
>
> Up until this point we have been running tests using the default journal
> size of '1024'.
> Before we start to place production data on the cluster I was want to clear
> up the following questions I have:
>
> 1)Is there a more appropriate journal size for my setup given the specs
> listed above?
>
> 2)According to this link:
>
> http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11
>
> CERN is using  '/dev/disk/by-path' for their OSD's.
>
> Does ceph-deploy currently support setting up OSD's using this method?

Indeed it does!

`ceph-deploy osd --help` got updated recently to demonstrate how this
needs to be done (an extra step is involved):

For paths, first prepare and then activate:

ceph-deploy osd prepare {osd-node-name}:/path/to/osd
ceph-deploy osd activate {osd-node-name}:/path/to/osd



>
> Thanks,
>
> Shain
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
> smi...@npr.org | 202.513.3649
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Sage Weil

On Mon, 21 Oct 2013, Mike Snitzer wrote:
> On Mon, Oct 21 2013 at 12:02pm -0400,
> Sage Weil  wrote:
> 
> > On Mon, 21 Oct 2013, Mike Snitzer wrote:
> > > On Mon, Oct 21 2013 at 10:11am -0400,
> > > Christoph Hellwig  wrote:
> > > 
> > > > On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > > > > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > > > > typical), but with LVM it's only 4KB.  Unfortunately my memory is a 
> > > > > bit 
> > > > > fuzzy here, but I seem to recall a property on the request_queue or 
> > > > > device 
> > > > > that affected this.  RBD is currently doing
> > > > 
> > > > Unfortunately most device mapper modules still split all I/O into 4k
> > > > chunks before handling them.  They rely on the elevator to merge them
> > > > back together down the line, which isn't overly efficient but should at
> > > > least provide larger segments for the common cases.
> > > 
> > > It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> > > no?  Unless care is taken to assemble larger bios (higher up the IO
> > > stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
> > > in $PAGE_SIZE granularity.
> > > 
> > > I would expect direct IO to before better here because it will make use
> > > of bio_add_page to build up larger IOs.
> > 
> > I do know that we regularly see 128 KB requests when we put XFS (or 
> > whatever else) directly on top of /dev/rbd*.
> 
> Should be pretty straight-forward to identify any limits that are
> different by walking sysfs/queue, e.g.:
> 
> grep -r . /sys/block/rdbXXX/queue
> vs
> grep -r . /sys/block/dm-X/queue
> 
> Could be there is an unexpected difference.  For instance, there was
> this fix recently: http://patchwork.usersys.redhat.com/patch/69661/
> 
> > > Taking a step back, the rbd driver is exposing both the minimum_io_size
> > > and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
> > > the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
> > > to respect the limits when it assembles its bios (via bio_add_page).
> > > 
> > > Sage, any reason why you don't use traditional raid geomtry based IO
> > > limits?, e.g.:
> > > 
> > > minimum_io_size = raid chunk size
> > > optimal_io_size = raid chunk size * N stripes (aka full stripe)
> > 
> > We are... by default we stripe 4M chunks across 4M objects.  You're 
> > suggesting it would actually help to advertise a smaller minimim_io_size 
> > (say, 1MB)?  This could easily be made tunable.
> 
> You're striping 4MB chunks across 4 million stripes?
> 
> So the full stripe size in bytes is 17592186044416 (or 16TB)?  Yeah
> cannot see how XFS could make use of that ;)

Sorry, I mean the stripe count is effectively 1.  Each 4MB gets mapped to 
a new 4MB object (for a total of image_size / 4MB objects).  So I think 
minimum_io_size and optimal_io_size are technically correct in this case.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Mike Snitzer

On Mon, Oct 21 2013 at  2:06pm -0400,
Christoph Hellwig  wrote:

> On Mon, Oct 21, 2013 at 11:01:29AM -0400, Mike Snitzer wrote:
> > It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> > no?
> 
> Well, it's the block layer based on what DM tells it.  Take a look at
> dm_merge_bvec
> 
> >From dm_merge_bvec:
> 
>   /*
>  * If the target doesn't support merge method and some of the devices
>  * provided their merge_bvec method (we know this by looking at
>  * queue_max_hw_sectors), then we can't allow bios with multiple 
> vector
>  * entries.  So always set max_size to 0, and the code below allows
>  * just one page.
>  */
>   
> Although it's not the general case, just if the driver has a
> merge_bvec method.  But this happens if you using DM ontop of MD where I
> saw it aswell as on rbd, which is why it's correct in this context, too.

Right, but only if the DM target that is being used doesn't have a
.merge method.  I don't think it was ever shared which DM target is in
use here.. but both the linear and stripe DM targets provide a .merge
method.
 
> Sorry for over generalizing a bit.

No problem.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Christoph Hellwig

On Mon, Oct 21, 2013 at 11:01:29AM -0400, Mike Snitzer wrote:
> It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
> no?

Well, it's the block layer based on what DM tells it.  Take a look at
dm_merge_bvec

>From dm_merge_bvec:

/*
 * If the target doesn't support merge method and some of the devices
 * provided their merge_bvec method (we know this by looking at
 * queue_max_hw_sectors), then we can't allow bios with multiple vector
 * entries.  So always set max_size to 0, and the code below allows
 * just one page.
 */

Although it's not the general case, just if the driver has a
merge_bvec method.  But this happens if you using DM ontop of MD where I
saw it aswell as on rbd, which is why it's correct in this context, too.

Sorry for over generalizing a bit.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Boot from volume with Dumpling on RDO/CentOS 6 (using backported QEMU 0.12)

2013-10-21 Thread Andrew Richards

Thanks, Josh.  I am able to boot from by RBD Cinder volumes now.

Thanks,
Andy

On Oct 21, 2013, at 1:38 PM, Josh Durgin  wrote:

> On 10/21/2013 10:35 AM, Andrew Richards wrote:
>> Thanks for the response Josh!
>> 
>> If the Ceph CLI tool still needs to be there for Cinder in Havana, then
>> am I correct in assuming that I still also need to export
>> "CEPH_ARGS='--id volumes'" in my cinder init script for the sake of
>> cephx like I had to do in Grizzly?
> 
> No, that's no longer necessary.
> 
> Josh
> 
>> Thanks,
>> Andy
>> 
>> On Oct 21, 2013, at 12:26 PM, Josh Durgin > > wrote:
>> 
>>> On 10/21/2013 09:03 AM, Andrew Richards wrote:
 Hi Everybody,
 
 I'm attempting to get Ceph working for CentOS 6.4 running RDO Havana for
 Cinder volume storage and boot-from-volume, and I keep bumping into a
 very unhelpful errors on my nova-compute test node and my cinder
 controller node.
 
 Here is what I see on my cinder-volume controller (Node #1) when I try
 to attach a RBD-backed Cinder volume to a Nova VM using either the GUI
 or nova volume-attach (/var/log/cinder/volume.log):
 
 2013-10-20 18:21:05.880 13668 ERROR cinder.openstack.common.rpc.amqp
 [req-bd62cb07-42e7-414a-86dc-f26f7a569de6
 9bfee22cd15b4dc0a2e203d7c151edbc 8431635821f84285afdd0f5faf1ce1aa]
 Exception during message handling
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 Traceback (most recent call last):
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File
 "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py",
 line 441, in _process_data
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 **args)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File
 "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py",
 line 148, in dispatch
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 return getattr(proxyobj, method)(ctxt, **kwargs)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 808, in
 wrapper
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 return func(self, *args, **kwargs)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line
 624, in initialize_connection
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 conn_info = self.driver.initialize_connection(volume, connector)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
 line 665, in initialize_connection
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 hosts, ports = self._get_mon_addrs()
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/rbd.py",
 line 312, in _get_mon_addrs
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 out, _ = self._execute(*args)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in
 execute
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 return processutils.execute(*cmd, **kwargs)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File
 "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py",
 line 158, in execute
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 shell=shell)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib/python2.6/site-packages/eventlet/green/subprocess.py",
 line 25, in __init__
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 errread, errwrite)
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 raise child_exception
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 OSError: [Errno 2] No such file or directory
 2013-10-20 18:21:05.880 13668 TRACE cinder.openstack.common.rpc.amqp
 2013-10-20 18:21:05.883 13668 ERROR cinder.openstack.common.rpc.common
 [req-bd62cb07-42e7-414a-8

Re: [ceph-users] Ceph stand @ FOSDEM

2013-10-21 Thread Wido den Hollander


Hi Loic,

On 10/19/2013 02:57 PM, Loic Dachary wrote:

Hi Ceph,

I don't know if anyone thought about asking for a Ceph stand during FOSDEM. If 
there was one, I would volunteer to sit at the table during a full day. The 
requirement is that there are at least two persons at all times.

https://fosdem.org/2014/news/2013-09-17-call-for-participation-part-two/



Seems like a good idea. I'm however not sure if I can be there since I 
might be on vacation then.


I'd help as well and be at the stand for one day.


Cheers



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Full OSD with 29% free

2013-10-21 Thread Bryan Stillwell

So I'm running into this issue again and after spending a bit of time
reading the XFS mailing lists, I believe the free space is too
fragmented:

[root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
   from  to extents  blockspct
  1   1   85773   85773   0.24
  2   3  176891  444356   1.27
  4   7  430854 2410929   6.87
  8  15 2327527 30337352  86.46
 16  31   75871 1809577   5.16
total free extents 3096916
total free blocks 35087987
average free extent size 11.33


Compared to a drive which isn't reporting 'No space left on device':

[root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
   from  to extents  blockspct
  1   1  133148  133148   0.15
  2   3  320737  808506   0.94
  4   7  809748 4532573   5.27
  8  15 4536681 59305608  68.96
 16  31   31531  751285   0.87
 32  63 364   16367   0.02
 64 127  909174   0.01
128 255   92072   0.00
256 511  48   18018   0.02
5121023 128  102422   0.12
   10242047 290  451017   0.52
   20484095 538 1649408   1.92
   40968191 851 5066070   5.89
   8192   16383 746 8436029   9.81
  16384   32767 194 4042573   4.70
  32768   65535  15  614301   0.71
  65536  131071   1   66630   0.08
total free extents 5835119
total free blocks 86005201
average free extent size 14.7392


What I'm wondering is if reducing the block size from 4K to 2K (or 1K)
would help?  I'm pretty sure this would take require re-running
mkfs.xfs on every OSD to fix if that's the case...

Thanks,
Bryan


On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell
 wrote:
>
> The filesystem isn't as full now, but the fragmentation is pretty low:
>
> [root@den2ceph001 ~]# df /dev/sdc1
> Filesystem   1K-blocks  Used Available Use% Mounted on
> /dev/sdc1486562672 270845628 215717044  56% 
> /var/lib/ceph/osd/ceph-1
> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
> actual 3481543, ideal 3447443, fragmentation factor 0.98%
>
> Bryan
>
> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe  
> wrote:
> >
> > How fragmented is that file system?
> >
> > Sent from my iPad
> >
> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell  
> > > wrote:
> > >
> > > This appears to be more of an XFS issue than a ceph issue, but I've
> > > run into a problem where some of my OSDs failed because the filesystem
> > > was reported as full even though there was 29% free:
> > >
> > > [root@den2ceph001 ceph-1]# touch blah
> > > touch: cannot touch `blah': No space left on device
> > > [root@den2ceph001 ceph-1]# df .
> > > Filesystem   1K-blocks  Used Available Use% Mounted on
> > > /dev/sdc1486562672 342139340 144423332  71% 
> > > /var/lib/ceph/osd/ceph-1
> > > [root@den2ceph001 ceph-1]# df -i .
> > > FilesystemInodes   IUsed   IFree IUse% Mounted on
> > > /dev/sdc160849984 4097408 567525767% 
> > > /var/lib/ceph/osd/ceph-1
> > > [root@den2ceph001 ceph-1]#
> > >
> > > I've tried remounting the filesystem with the inode64 option like a
> > > few people recommended, but that didn't help (probably because it
> > > doesn't appear to be running out of inodes).
> > >
> > > This happened while I was on vacation and I'm pretty sure it was
> > > caused by another OSD failing on the same node.  I've been able to
> > > recover from the situation by bringing the failed OSD back online, but
> > > it's only a matter of time until I'll be running into this issue again
> > > since my cluster is still being populated.
> > >
> > > Any ideas on things I can try the next time this happens?
> > >
> > > Thanks,
> > > Bryan
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intermittent poor performance on 3 node cluster

2013-10-21 Thread Gregory Farnum

On Mon, Oct 21, 2013 at 8:05 AM, Pieter Steyn  wrote:
> Hi all,
>
> I'm using Ceph as a filestore for my nginx web server, in order to have
> shared storage, and redundancy with automatic failover.
>
> The cluster is not high spec, but given my use case (lots of images) - I am
> very dissapointed with the current throughput I'm getting, and was hoping
> for some advice.
>
> I'm using CephFS and the latest Dumpling version on Ubuntu Server 12.04
>
> Server specs:
>
> CephFS1, CephFS2:
>
> Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
> 12GB Ram
> 1x 2TB SATA XFS
> 1x 2TB SATA (For the journal)
>
> Each server runs 1x OSD, 1x MON and 1x MDS.
> A third server runs 1x MON for Paxos to work correctly.
> All machines are connected via a gigabit switch.
>
> The ceph config as follows:
>
> [global]
> fsid = 58b87152-5ce8-491e-ae9c-07caeea3fefb
> mon_initial_members = lb1, cephfs1, cephfs2
> mon_host = 192.168.1.58,192.168.1.70,192.168.1.72
> auth_supported = cephx
> osd_journal_size = 1024
> filestore_xattr_use_omap = true
>
> Osd dump:
>
> epoch 750
> fsid 58b87152-5ce8-491e-ae9c-07caeea3fefb
> created 2013-09-12 13:13:02.695411
> modified 2013-10-21 14:28:31.780838
> flags
>
> pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
> pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins
> pg_num 64 pgp_num 64 last_change 1 owner 0
> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
> pg_num 64 pgp_num 64 last_change 1 owner 0
>
> max_osd 4
> osd.0 up   in  weight 1 up_from 741 up_thru 748 down_at 739
> last_clean_interval [614,738) 192.168.1.70:6802/12325
> 192.168.1.70:6803/12325 192.168.1.70:6804/12325 192.168.1.70:6805/12325
> exists,up d59119d5-bccb-43ea-be64-9d2272605617
> osd.1 up   in  weight 1 up_from 748 up_thru 748 down_at 745
> last_clean_interval [20,744) 192.168.1.72:6800/4271 192.168.1.72:6801/4271
> 192.168.1.72:6802/4271 192.168.1.72:6803/4271 exists,up
> 930c097a-f68b-4f9c-a6a1-6787a1382a41
>
> pg_temp 0.12 [1,0,3]
> pg_temp 0.16 [1,0,3]
> pg_temp 0.18 [1,0,3]
> pg_temp 1.11 [1,0,3]
> pg_temp 1.15 [1,0,3]
> pg_temp 1.17 [1,0,3]
>
> Slowdowns increase the load of my nginx servers to around 40, and access to
> the CephFS mount is incredibly slow.  These slowdowns happen about once a
> week.  I typically solve them by restarting the MDS.
>
> When the cluster gets slow I see the following in my logs:
>
> 2013-10-21 14:33:54.079200 7f6301e10700  0 log [WRN] : slow request
> 30.281651 seconds old, received at 2013-10-21 14:33:23.797488:
> osd_op(mds.0.8:16266 14094c4. [tmapup 0~0] 1.91102783 e750) v4
> currently commit sent
> 013-10-21 14:33:54.079191 7f6301e10700  0 log [WRN] : 6 slow requests, 6
> included below; oldest blocked for > 30.281651 secs

If this is the sole kind of slow request you see (tmapup reports),
then it looks like the MDS is flushing out directory updates and the
OSD is taking a long time to process them. I'm betting you have very
large directories and it's taking the OSD a while to process the
changes; and the MDS is getting backed up while it does so because
it's trying to flush them out of memory.

> Any advice? Would increasing the PG num for data and metadata help? Would
> moving the MDS to a host which does not also run an OSD be greatly
> beneficial?

Your PG counts are probably fine for a cluster of that size, although
you could try bumping them up by 2x or something. More likely, though,
is that your CephFS install is not well-tuned for the directory sizes
you're using. What's the largest directory you're using? Have you
tried bumping up your mds cache size? (And what's the host memory
usage look like?)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Samuel Just

It looks like an xattr vanished from one of your objects on osd.3.
What fs are you running?

On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams  wrote:
> Hello all,
>
> Similar to this post from last month, I am experiencing 2 nodes that are
> constantly crashing upon start up:
> http://www.spinics.net/lists/ceph-users/msg04589.html
>
> Here are the logs from the 2 without the debug commands, here:
> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>
> I have run the osds with the debug statements per the email, but I'm unsure
> where to post them, they are 108M each without compression. Should I create
> a bug on the tracker?
>
> Thanks,
> Jeff
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams

We're running xfs on a 3.8.0-31-generic kernel

Thanks,
Jeff

On 10/21/13 1:54 PM, "Samuel Just"  wrote:

>It looks like an xattr vanished from one of your objects on osd.3.
>What fs are you running?
>
>On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
>wrote:
>> Hello all,
>>
>> Similar to this post from last month, I am experiencing 2 nodes that are
>> constantly crashing upon start up:
>> http://www.spinics.net/lists/ceph-users/msg04589.html
>>
>> Here are the logs from the 2 without the debug commands, here:
>> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>>
>> I have run the osds with the debug statements per the email, but I'm
>>unsure
>> where to post them, they are 108M each without compression. Should I
>>create
>> a bug on the tracker?
>>
>> Thanks,
>> Jeff
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw usage

2013-10-21 Thread Yehuda Sadeh

On Sun, Oct 20, 2013 at 9:04 PM, Derek Yarnell  wrote:
> So I have tried to enable usage logging on a new production Ceph RadosGW
> cluster but nothing seems to show up.
>
> I have added to the [client.radosgw.] section the following
>
> rgw enable usage log = true
> rgw usage log tick interval = 30
> rgw usage log flush threshold = 1024
> rgw usage max shards = 32
> rgw usage max user shards = 1
>
> Restarted the radosgw but I don't see anything in the logs (running in
> debug 20)
>
> # radosgw-admin usage show --uid=derek --bucket=derek
> { "entries": [],
>   "summary": []}
>
> Is there something more I can poke to figure out why the gateway is not
> logging?
>


Try moving the above configurables to the global section, if it's
working then you're probably using the wrong section.


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] error: failed to add bucket placement

2013-10-21 Thread Snider, Tim

Can someone tell me what I'm missing. I have a radosgw user created.
I get the following complaint when I try to create a pool with or without the 
--uid parameter:

#radosgw-admin pool add --pool=radosPool --uid=rados
failed to add bucket placement: (2) No such file or directory

Thanks,
Tim

Timothy Snider
Strategic Planning & Architecture - Advanced Development
NetApp
316-636-8736 Direct Phone
316-213-0223 Mobile Phone
tim.sni...@netapp.com
netapp.com
 [Description: http://media.netapp.com/images/netapp-logo-sig-5.gif]

<>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD journal size

2013-10-21 Thread Shain Miley

Alfredo,

Thanks a lot for the info.

I'll make sure I have an updated version of ceph-deploy and give it another  
shot.

Shain
Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: Alfredo Deza [alfredo.d...@inktank.com]
Sent: Monday, October 21, 2013 2:03 PM
To: Shain Miley
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley  wrote:
> Hi,
>
> We have been testing a ceph cluster with the following specs:
>
> 3 Mon's
> 72 OSD's spread across 6 Dell R-720xd servers
> 4 TB SAS drives
> 4 bonded 10 GigE NIC ports per server
> 64 GB of RAM
>
> Up until this point we have been running tests using the default journal
> size of '1024'.
> Before we start to place production data on the cluster I was want to clear
> up the following questions I have:
>
> 1)Is there a more appropriate journal size for my setup given the specs
> listed above?
>
> 2)According to this link:
>
> http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11
>
> CERN is using  '/dev/disk/by-path' for their OSD's.
>
> Does ceph-deploy currently support setting up OSD's using this method?

Indeed it does!

`ceph-deploy osd --help` got updated recently to demonstrate how this
needs to be done (an extra step is involved):

For paths, first prepare and then activate:

ceph-deploy osd prepare {osd-node-name}:/path/to/osd
ceph-deploy osd activate {osd-node-name}:/path/to/osd



>
> Thanks,
>
> Shain
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
> smi...@npr.org | 202.513.3649
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Samuel Just

Can you get the pg to recover without osd.3?
-Sam

On Mon, Oct 21, 2013 at 1:59 PM, Jeff Williams  wrote:
> We're running xfs on a 3.8.0-31-generic kernel
>
> Thanks,
> Jeff
>
> On 10/21/13 1:54 PM, "Samuel Just"  wrote:
>
>>It looks like an xattr vanished from one of your objects on osd.3.
>>What fs are you running?
>>
>>On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
>>wrote:
>>> Hello all,
>>>
>>> Similar to this post from last month, I am experiencing 2 nodes that are
>>> constantly crashing upon start up:
>>> http://www.spinics.net/lists/ceph-users/msg04589.html
>>>
>>> Here are the logs from the 2 without the debug commands, here:
>>> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>>>
>>> I have run the osds with the debug statements per the email, but I'm
>>>unsure
>>> where to post them, they are 108M each without compression. Should I
>>>create
>>> a bug on the tracker?
>>>
>>> Thanks,
>>> Jeff
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams

What is the best way to do that? I tried ceph pg repair, but it only did
so much. 

On 10/21/13 3:54 PM, "Samuel Just"  wrote:

>Can you get the pg to recover without osd.3?
>-Sam
>
>On Mon, Oct 21, 2013 at 1:59 PM, Jeff Williams 
>wrote:
>> We're running xfs on a 3.8.0-31-generic kernel
>>
>> Thanks,
>> Jeff
>>
>> On 10/21/13 1:54 PM, "Samuel Just"  wrote:
>>
>>>It looks like an xattr vanished from one of your objects on osd.3.
>>>What fs are you running?
>>>
>>>On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
>>>wrote:
 Hello all,

 Similar to this post from last month, I am experiencing 2 nodes that
are
 constantly crashing upon start up:
 http://www.spinics.net/lists/ceph-users/msg04589.html

 Here are the logs from the 2 without the debug commands, here:
 http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h

 I have run the osds with the debug statements per the email, but I'm
unsure
 where to post them, they are 108M each without compression. Should I
create
 a bug on the tracker?

 Thanks,
 Jeff

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] error: failed to add bucket placement

2013-10-21 Thread Yehuda Sadeh

On Mon, Oct 21, 2013 at 2:46 PM, Snider, Tim  wrote:
>
> Can someone tell me what I'm missing. I have a radosgw user created.
>
> I get the following complaint when I try to create a pool with or without the 
> --uid parameter:
>
>
> #radosgw-admin pool add --pool=radosPool --uid=rados
>
> failed to add bucket placement: (2) No such file or directory
>
>
What are you trying to do exactly? That command is to add a pool to
the list of data placement pools, but is leveraging the legacy data
placement system. Note that any rgw system pool (e.g., the data and
index pools) should always be prefixed with a period.
If you're trying to leverage the new data placement capabilities then
you need to do it through the zone and region configuration.

For example:

$ radosgw-admin region get > region.conf.json
$ cat region.conf.json
{ "name": "default",
  "api_name": "",
  "is_master": "true",
  "endpoints": [],
  "master_zone": "",
  "zones": [
{ "name": "default",
  "endpoints": [],
  "log_meta": "false",
  "log_data": "false"}],
  "placement_targets": [
{ "name": "default-placement",
  "tags": []}],
  "default_placement": "default-placement"}




{ "name": "default",
  "api_name": "",
  "is_master": "true",
  "endpoints": [],
  "master_zone": "",
  "zones": [
{ "name": "default",
  "endpoints": [],
  "log_meta": "false",
  "log_data": "false"}],
  "placement_targets": [
{ "name": "default-placement",
  "tags": []},
{ "name": "fast-placement",
  "tags": []}],
  "default_placement": "default-placement"}

$ radosgw-admin region set < region.conf.json

Then you need to set up all the zones within the region to assign data
and index pools to be used with this new placement. Use readosgw-admin
zone get + radosgw-admin zone set for that. Here's an example config:

{ "domain_root": ".rgw",
  "control_pool": ".rgw.control",
  "gc_pool": ".rgw.gc",
  "log_pool": ".log",
  "intent_log_pool": ".intent-log",
  "usage_log_pool": ".usage",
  "user_keys_pool": ".users",
  "user_email_pool": ".users.email",
  "user_swift_pool": ".users.swift",
  "user_uid_pool": ".users.uid",
  "system_key": { "access_key": "",
  "secret_key": ""},
  "placement_pools": [
{ "key": "default-placement",
  "val": { "index_pool": ".rgw.buckets.index",
  "data_pool": ".rgw.buckets"}},
{ "key": "fast-placement",
  "val": { "index_pool": ".rgw.fast.buckets.index",
  "data_pool": ".rgw.fast.buckets.data"}}]}

and finally, you can modify the user's default placement, via:

$ radosgw-admin metadata get user: > user.md.json



$ radosgw-admin metadata put user: < user.md.json


When creating a bucket users can specify which placement target they
want to use with that specific bucket (by using the S3 create bucket
location constraints field, format is
location_constraints='[region][:placement-target]', e.g.,
location_constraints=':fast-placement'). Also, you can limit placement
target to be only used by specific users via setting tags on the
target config.


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Samuel Just

What happened when you simply left the cluster to recover without osd.11 in?
-Sam

On Mon, Oct 21, 2013 at 4:01 PM, Jeff Williams  wrote:
> What is the best way to do that? I tried ceph pg repair, but it only did
> so much.
>
> On 10/21/13 3:54 PM, "Samuel Just"  wrote:
>
>>Can you get the pg to recover without osd.3?
>>-Sam
>>
>>On Mon, Oct 21, 2013 at 1:59 PM, Jeff Williams 
>>wrote:
>>> We're running xfs on a 3.8.0-31-generic kernel
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On 10/21/13 1:54 PM, "Samuel Just"  wrote:
>>>
It looks like an xattr vanished from one of your objects on osd.3.
What fs are you running?

On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
wrote:
> Hello all,
>
> Similar to this post from last month, I am experiencing 2 nodes that
>are
> constantly crashing upon start up:
> http://www.spinics.net/lists/ceph-users/msg04589.html
>
> Here are the logs from the 2 without the debug commands, here:
> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>
> I have run the osds with the debug statements per the email, but I'm
>unsure
> where to post them, they are 108M each without compression. Should I
>create
> a bug on the tracker?
>
> Thanks,
> Jeff
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] import VHD

2013-10-21 Thread James Harper

Can any suggest a straightforward way to import a VHD to a ceph RBD? The easier 
the better!

Thanks

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams

I apologize,  I should have mentioned that both osd.3 and osd.11 crash 
immediately and if I do not 'set noout', the crash cascades to the rest of the 
cluster.

Thanks,
Jeff


Sent from my Samsung Galaxy Note™, an AT&T LTE smartphone



 Original message 
From: Samuel Just 
Date: 10/21/2013 4:47 PM (GMT-08:00)
To: Jeff Williams 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Continually crashing osds


What happened when you simply left the cluster to recover without osd.11 in?
-Sam

On Mon, Oct 21, 2013 at 4:01 PM, Jeff Williams  wrote:
> What is the best way to do that? I tried ceph pg repair, but it only did
> so much.
>
> On 10/21/13 3:54 PM, "Samuel Just"  wrote:
>
>>Can you get the pg to recover without osd.3?
>>-Sam
>>
>>On Mon, Oct 21, 2013 at 1:59 PM, Jeff Williams 
>>wrote:
>>> We're running xfs on a 3.8.0-31-generic kernel
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On 10/21/13 1:54 PM, "Samuel Just"  wrote:
>>>
It looks like an xattr vanished from one of your objects on osd.3.
What fs are you running?

On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
wrote:
> Hello all,
>
> Similar to this post from last month, I am experiencing 2 nodes that
>are
> constantly crashing upon start up:
> http://www.spinics.net/lists/ceph-users/msg04589.html
>
> Here are the logs from the 2 without the debug commands, here:
> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>
> I have run the osds with the debug statements per the email, but I'm
>unsure
> where to post them, they are 108M each without compression. Should I
>create
> a bug on the tracker?
>
> Thanks,
> Jeff
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CloudStack + KVM(Ubuntu 12.04, Libvirt 1.0.2) + Ceph [Seeking Help]

2013-10-21 Thread Josh Durgin


On 10/16/2013 04:25 PM, Kelcey Jamison Damage wrote:

Hi,

I have gotten so close to have Ceph work in my cloud but I have reached
a roadblock. Any help would be greatly appreciated.

I receive the following error when trying to get KVM to run a VM with an
RBD volume:

Libvirtd.log:

2013-10-16 22 :05:15.516+: 9814: error :
qemuProcessReadLogOutput:1477 : internal error Process exited while
reading console log output:
char device redirected to /dev/pts/3
kvm: -drive
file=rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=

10.0.1.83\:6789,if=none,id=drive-ide0-0-1: error connecting
kvm: -drive
file=rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=

10.0.1.83\:6789,if=none,id=drive-ide0-0-1: could not open disk image
rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh
/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=10.0.1.83\:6789:
Invalid argument


This looks correct, there could be a firewall or something else
preventing the connection from working. Could you share the output of:

qemu-img -f raw info 
'rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host==10.0.1.83\\:6789:debug_ms=1'



Ceph Pool showing test volume exists:

root@ubuntu-test-KVM-RBD:/opt# rbd -p libvirt-pool ls
new-libvirt-image

Ceph Auth:

client.libvirt
key: AQBx+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx
pool=libvirt-pool

KVM Drive Support:

root@ubuntu-test-KVM-RBD:/opt# kvm --drive
format=?ibvirt-image:id=libvirt:key=+F5Sc
Supported formats: vvfat vpc vmdk vdi sheepdog rbd raw host_cdrom
host_floppy host_device file qed qcow2 qcow parallels nbd dmg tftp ftps ft
p https http cow cloop bochs blkverify blkdebug


These settings all look fine too.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CloudStack + KVM(Ubuntu 12.04, Libvirt 1.0.2) + Ceph [Seeking Help]

2013-10-21 Thread David Clarke

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/10/13 14:19, Josh Durgin wrote:
> On 10/16/2013 04:25 PM, Kelcey Jamison Damage wrote:
>> Hi,
>> 
>> I have gotten so close to have Ceph work in my cloud but I have reached a 
>> roadblock. Any help
>> would be greatly appreciated.
>> 
>> I receive the following error when trying to get KVM to run a VM with an RBD 
>> volume:
>> 
>> Libvirtd.log:
>> 
>> 2013-10-16 22 :05:15.516+: 9814: error : 
>> qemuProcessReadLogOutput:1477 : internal error Process exited while reading 
>> console log
>> output: char device redirected to /dev/pts/3 kvm: -drive 
>> file=rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=
>>
>>
>> 
10.0.1.83\:6789,if=none,id=drive-ide0-0-1: error connecting
>> kvm: -drive 
>> file=rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=
>>
>>
>> 
10.0.1.83\:6789,if=none,id=drive-ide0-0-1: could not open disk image
>> rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh 
>> /gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=10.0.1.83\:6789: Invalid 
>> argument
> 
> This looks correct, there could be a firewall or something else preventing 
> the connection from
> working. Could you share the output of:
> 
> qemu-img -f raw info 
> 'rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host==10.0.1.83\\:6789:debug_ms=1'
>
> 
>> Ceph Pool showing test volume exists:
>> 
>> root@ubuntu-test-KVM-RBD:/opt# rbd -p libvirt-pool ls new-libvirt-image
>> 
>> Ceph Auth:
>> 
>> client.libvirt key: AQBx+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA== caps: [mon] 
>> allow r caps: [osd]
>> allow class-read object_prefix rbd_children, allow rwx pool=libvirt-pool
>> 
>> KVM Drive Support:
>> 
>> root@ubuntu-test-KVM-RBD:/opt# kvm --drive 
>> format=?ibvirt-image:id=libvirt:key=+F5Sc 
>> Supported formats: vvfat vpc vmdk vdi sheepdog rbd raw host_cdrom 
>> host_floppy host_device
>> file qed qcow2 qcow parallels nbd dmg tftp ftps ft p https http cow cloop 
>> bochs blkverify
>> blkdebug
> 
> These settings all look fine too.

The key for client.libvirt is 'AQBx+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==', 
whereas the key being
passed to KVM is '+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA=='.  Are you missing a few 
characters at the
beginning there?


- -- 
David Clarke
Systems Architect
Catalyst IT
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSZdRaAAoJEPH5xSy8rJPM0n8P/1fDMrlYuO3kA1leNL475G5R
s5hZKMOlOpXHT8yR7TQ0JHZFi5hh2Rf4CTOMU0GgKQltS4DgAh5z3Cc4FGURSeNU
oK/eCKCAvlaK4JFw/OOppYrXwWqRlkSwSsKe3Y8HAX+grFV/Tp648iNBw+3LWksK
A0WcotMVx3jARby2Du4ERuKQtq+8MfwxwNtCvm2zGTIBlWCLt8QJaBTCaxiSCcsu
ojvDbk/t0oelqhbYLhOgFVjfF6vj8W+ZAAtOvZs6ltUaoyvjd/ImjuGtYwEERGsA
dGc64qQr+YBFdtT6SP45LXXuya7z1ZYLAB/azDI0t66QZ5xEK1L+d6GaU0GU0TmK
BA6yYY9KadxAarQRn30W5sJEQuTEoZcnAzlDY7y38NKPgIiAREz04feSrxeSaeHi
o32p5HJnGJZ+XRWroBuMEO/EpmdXMhuR4LRJ7diEgvVjJyEoUbV8RNTxumgKsv0I
evPa0fxv2uD/zerNDT2Zey0aKXJ5P15FFYhxlj4bQexr3iT3/BTbQOMxIfREaS9O
Bu+tWKv5u5k5HP+GZH53ACULJyLYBUq1mSxATrchp8V0U8dTD7YvGET2oBF1BEfH
i3MKxuhz3kMAx3wd4w9wdZlp0MsimR6l3S3kvErvR+ZfsSRbZIywsvy+hLhLq6I3
tF7BdNwsY4o0TfP6el+8
=AyLj
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados bench result when increasing OSDs

2013-10-21 Thread Kyle Bader

Besides what Mark and Greg said it could be due to additional hops through
network devices. What network devices are you using, what is the network
topology and does your CRUSH map reflect the network topology?
On Oct 21, 2013 9:43 AM, "Gregory Farnum"  wrote:

> On Mon, Oct 21, 2013 at 7:13 AM, Guang Yang  wrote:
> > Dear ceph-users,
> > Recently I deployed a ceph cluster with RadosGW, from a small one (24
> OSDs) to a much bigger one (330 OSDs).
> >
> > When using rados bench to test the small cluster (24 OSDs), it showed
> the average latency was around 3ms (object size is 5K), while for the
> larger one (330 OSDs), the average latency was around 7ms (object size 5K),
> twice comparing the small cluster.
> >
> > The OSD within the two cluster have the same configuration, SAS disk,
>  and two partitions for one disk, one for journal and the other for
> metadata.
> >
> > For PG numbers, the small cluster tested with the pool having 100 PGs,
> and for the large cluster, the pool has 4 PGs (as I will to further
> scale the cluster, so I choose a much large PG).
> >
> > Does my test result make sense? Like when the PG number and OSD
> increase, the latency might drop?
>
> Besides what Mark said, can you describe your test in a little more
> detail? Writing/reading, length of time, number of objects, etc.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw usage

2013-10-21 Thread Derek Yarnell

> Try moving the above configurables to the global section, if it's
> working then you're probably using the wrong section.

Moving sections doesn't seem to change the behavior.  My two other test
gateways seems to be working fine with similar configs all running
0.67.4 (slightly patched for ACLs[1]).  Is there a config variable that
can be set for radosgw to show its configuration at startup?

# ceph-conf -n client.radosgw.cbcbproxy00 'rgw enable usage log'
true

# radosgw-admin usage show
{ "entries": [],
  "summary": []}

# /etc/init.d/ceph-radosgw restart
Stopping radosgw instance(s)...[  OK  ]
Starting radosgw instance(s)...
2013-10-21 20:14:07.935720 7f72ecd78820 -1 WARNING: libcurl doesn't
support curl_multi_wait()
2013-10-21 20:14:07.935721 7f72ecd78820 -1 WARNING: cross zone / region
transfer performance may be affected
Starting client.radosgw.cbcbproxy00... [  OK  ]

# ps axuww | grep radosgw
apache   24136  0.3  0.0 7545800 14996 ?   Ssl  19:58   0:02
/usr/bin/radosgw -n client.radosgw.cbcbproxy00

# radosgw-admin usage show
{ "entries": [],
  "summary": []}


[1] - https://github.com/ceph/ceph/pull/672/files

-- 
---
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CloudStack + KVM(Ubuntu 12.04, Libvirt 1.0.2) + Ceph [Seeking Help]

2013-10-21 Thread Kelcey Jamison Damage

Oh hi, 

Turns out I solved it, It works with libvirt directly via CloudStack. The only 
major modification is to ensure you don't accidentally use client.user for 
authentication and just use user. 

My guess is the error I had was related to testing with virsh. 

Thanks for the reply. 


- Original Message -

From: "Josh Durgin"  
To: "Kelcey Jamison Damage" , 
ceph-us...@ceph.com 
Sent: Monday, October 21, 2013 6:19:29 PM 
Subject: Re: [ceph-users] CloudStack + KVM(Ubuntu 12.04, Libvirt 1.0.2) + Ceph 
[Seeking Help] 

On 10/16/2013 04:25 PM, Kelcey Jamison Damage wrote: 
> Hi, 
> 
> I have gotten so close to have Ceph work in my cloud but I have reached 
> a roadblock. Any help would be greatly appreciated. 
> 
> I receive the following error when trying to get KVM to run a VM with an 
> RBD volume: 
> 
> Libvirtd.log: 
> 
> 2013-10-16 22 :05:15.516+: 9814: error : 
> qemuProcessReadLogOutput:1477 : internal error Process exited while 
> reading console log output: 
> char device redirected to /dev/pts/3 
> kvm: -drive 
> file=rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=
>  
> 
> 10.0.1.83\:6789,if=none,id=drive-ide0-0-1: error connecting 
> kvm: -drive 
> file=rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=
>  
> 
> 10.0.1.83\:6789,if=none,id=drive-ide0-0-1: could not open disk image 
> rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh 
> /gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host=10.0.1.83\:6789: 
> Invalid argument 

This looks correct, there could be a firewall or something else 
preventing the connection from working. Could you share the output of: 

qemu-img -f raw info 
'rbd:libvirt-pool/new-libvirt-image:id=libvirt:key=+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA==:auth_supported=cephx\;none:mon_host==10.0.1.83\\:6789:debug_ms=1'
 

> Ceph Pool showing test volume exists: 
> 
> root@ubuntu-test-KVM-RBD:/opt# rbd -p libvirt-pool ls 
> new-libvirt-image 
> 
> Ceph Auth: 
> 
> client.libvirt 
> key: AQBx+F5ScBQlLhAAYCH8qhGEh/gjKW+NpziAlA== 
> caps: [mon] allow r 
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=libvirt-pool 
> 
> KVM Drive Support: 
> 
> root@ubuntu-test-KVM-RBD:/opt# kvm --drive 
> format=?ibvirt-image:id=libvirt:key=+F5Sc 
> Supported formats: vvfat vpc vmdk vdi sheepdog rbd raw host_cdrom 
> host_floppy host_device file qed qcow2 qcow parallels nbd dmg tftp ftps ft 
> p https http cow cloop bochs blkverify blkdebug 

These settings all look fine too. 

Josh 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] SSD question

2013-10-21 Thread Martin Catudal

Hi,
 I have purchase my hardware for my Ceph storage cluster but did not 
open any of my 960GB SSD drive box since I need to answer my question first.

Here's my hardware.

THREE server Dual 6 core Xeon 2U capable with 8 hotswap tray plus 2 SSD 
mount internally.
In each server I will have :
2 x SSD 840 Pro Samsung 128 GB in RAID 1 for the OS
2 x SSD 840 Pro Samsung for journal
4 x 4TB Hitachi 7K4000 7200RPM
1 x 960GB Crucial M500 for one fast OSD pool.

Configuration : One SSD journal for two 4TB so If I lost one SSD 
journal, I will only lost Two OSD instead of all my storage for that 
particular node.

I have also bought 3 x 960GB M500 SSD from Crucial for the creation of a 
fast Pool of OSD made from SSD's. So One 960GB per server for database 
application.
It is advisable to do that but it is better to return them and for the 
same price buy 6 more 4TB Hitachi?

Since the write acknowledgment is made from the SSD journal, does I have 
a huge improvement by using SSD as OSD?
My goal is to have solid fast performance for database ERP and 3D 
modeling of mining gallery run in VM.

Thank's
Martin

-- 
Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSD question

2013-10-21 Thread Mark Kirkwood


On 22/10/13 15:05, Martin Catudal wrote:

Hi,
  I have purchase my hardware for my Ceph storage cluster but did not
open any of my 960GB SSD drive box since I need to answer my question first.

Here's my hardware.

THREE server Dual 6 core Xeon 2U capable with 8 hotswap tray plus 2 SSD
mount internally.
In each server I will have :
2 x SSD 840 Pro Samsung 128 GB in RAID 1 for the OS
2 x SSD 840 Pro Samsung for journal
4 x 4TB Hitachi 7K4000 7200RPM
1 x 960GB Crucial M500 for one fast OSD pool.

Configuration : One SSD journal for two 4TB so If I lost one SSD
journal, I will only lost Two OSD instead of all my storage for that
particular node.

I have also bought 3 x 960GB M500 SSD from Crucial for the creation of a
fast Pool of OSD made from SSD's. So One 960GB per server for database
application.
It is advisable to do that but it is better to return them and for the
same price buy 6 more 4TB Hitachi?

Since the write acknowledgment is made from the SSD journal, does I have
a huge improvement by using SSD as OSD?
My goal is to have solid fast performance for database ERP and 3D
modeling of mining gallery run in VM.



Yeah, using SSD(s) for the journal will certainly improve performance. I 
am a little concerned about your choice of SSD tho - while the Samsung 
840 Pro is a great workstation drive I don't see any mention of power 
off safety in its specs. The risk there is that the journal data could 
be corrupted by an power (or power supply) glitch.


We are looking at the Intel S3700 for this reason - it *is* a bit more 
expensive but has a longer write lifetime and is power off safe (on 
board capacitor).


I'd be interested to hear what other people think with respect to the 
'right' type of SSD to use for Ceph journals!


Cheers

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSD question

2013-10-21 Thread Gregory Farnum

On Mon, Oct 21, 2013 at 7:05 PM, Martin Catudal  wrote:
> Hi,
>  I have purchase my hardware for my Ceph storage cluster but did not
> open any of my 960GB SSD drive box since I need to answer my question first.
>
> Here's my hardware.
>
> THREE server Dual 6 core Xeon 2U capable with 8 hotswap tray plus 2 SSD
> mount internally.
> In each server I will have :
> 2 x SSD 840 Pro Samsung 128 GB in RAID 1 for the OS
> 2 x SSD 840 Pro Samsung for journal
> 4 x 4TB Hitachi 7K4000 7200RPM
> 1 x 960GB Crucial M500 for one fast OSD pool.
>
> Configuration : One SSD journal for two 4TB so If I lost one SSD
> journal, I will only lost Two OSD instead of all my storage for that
> particular node.
>
> I have also bought 3 x 960GB M500 SSD from Crucial for the creation of a
> fast Pool of OSD made from SSD's. So One 960GB per server for database
> application.
> It is advisable to do that but it is better to return them and for the
> same price buy 6 more 4TB Hitachi?
>
> Since the write acknowledgment is made from the SSD journal, does I have
> a huge improvement by using SSD as OSD?
> My goal is to have solid fast performance for database ERP and 3D
> modeling of mining gallery run in VM.

The specifics depend on a lot of factors, but for database
applications you are likely to have better performance with an SSD
pool. This is because even though the journal can do fast
acknowledgements, that's for evening out write bursts — on average it
will restrict itself to the speed of the backing store. A good SSD can
generally do much more than 6x a HDD's random IOPS.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
> Thank's
> Martin
>
> --
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSD question

2013-10-21 Thread Martin B Nielsen

Hi,

Plus reads will still come from your non-SSD disks unless you're using
something like flashcache in front and as Greg said, having much more IOPS
available for your db often makes a difference (depending on load, usage
etc ofc).

We're using Samsung Pro 840 256GB pretty much like Martin describes and we
haven't had any issues (yet).

We've setup our environment so that if we lose a node or take one offline
for maint it won't impact the cluster much; with only 3 nodes I would prob.
go with more durable hw-specs.

Cheers,
Martin


On Tue, Oct 22, 2013 at 6:38 AM, Gregory Farnum  wrote:

> On Mon, Oct 21, 2013 at 7:05 PM, Martin Catudal 
> wrote:
> > Hi,
> >  I have purchase my hardware for my Ceph storage cluster but did not
> > open any of my 960GB SSD drive box since I need to answer my question
> first.
> >
> > Here's my hardware.
> >
> > THREE server Dual 6 core Xeon 2U capable with 8 hotswap tray plus 2 SSD
> > mount internally.
> > In each server I will have :
> > 2 x SSD 840 Pro Samsung 128 GB in RAID 1 for the OS
> > 2 x SSD 840 Pro Samsung for journal
> > 4 x 4TB Hitachi 7K4000 7200RPM
> > 1 x 960GB Crucial M500 for one fast OSD pool.
> >
> > Configuration : One SSD journal for two 4TB so If I lost one SSD
> > journal, I will only lost Two OSD instead of all my storage for that
> > particular node.
> >
> > I have also bought 3 x 960GB M500 SSD from Crucial for the creation of a
> > fast Pool of OSD made from SSD's. So One 960GB per server for database
> > application.
> > It is advisable to do that but it is better to return them and for the
> > same price buy 6 more 4TB Hitachi?
> >
> > Since the write acknowledgment is made from the SSD journal, does I have
> > a huge improvement by using SSD as OSD?
> > My goal is to have solid fast performance for database ERP and 3D
> > modeling of mining gallery run in VM.
>
> The specifics depend on a lot of factors, but for database
> applications you are likely to have better performance with an SSD
> pool. This is because even though the journal can do fast
> acknowledgements, that's for evening out write bursts — on average it
> will restrict itself to the speed of the backing store. A good SSD can
> generally do much more than 6x a HDD's random IOPS.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> >
> > Thank's
> > Martin
> >
> > --
> > Martin Catudal
> > Responsable TIC
> > Ressources Metanor Inc
> > Ligne directe: (819) 218-2708
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intermittent poor performance on 3 node cluster

2013-10-21 Thread Pieter Steyn


On 21/10/2013 22:45, Gregory Farnum wrote:

On Mon, Oct 21, 2013 at 8:05 AM, Pieter Steyn  wrote:

Hi all,

I'm using Ceph as a filestore for my nginx web server, in order to have
shared storage, and redundancy with automatic failover.

The cluster is not high spec, but given my use case (lots of images) - I am
very dissapointed with the current throughput I'm getting, and was hoping
for some advice.

I'm using CephFS and the latest Dumpling version on Ubuntu Server 12.04

Server specs:

CephFS1, CephFS2:

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
12GB Ram
1x 2TB SATA XFS
1x 2TB SATA (For the journal)

Each server runs 1x OSD, 1x MON and 1x MDS.
A third server runs 1x MON for Paxos to work correctly.
All machines are connected via a gigabit switch.

The ceph config as follows:

[global]
fsid = 58b87152-5ce8-491e-ae9c-07caeea3fefb
mon_initial_members = lb1, cephfs1, cephfs2
mon_host = 192.168.1.58,192.168.1.70,192.168.1.72
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

Osd dump:

epoch 750
fsid 58b87152-5ce8-491e-ae9c-07caeea3fefb
created 2013-09-12 13:13:02.695411
modified 2013-10-21 14:28:31.780838
flags

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0

max_osd 4
osd.0 up   in  weight 1 up_from 741 up_thru 748 down_at 739
last_clean_interval [614,738) 192.168.1.70:6802/12325
192.168.1.70:6803/12325 192.168.1.70:6804/12325 192.168.1.70:6805/12325
exists,up d59119d5-bccb-43ea-be64-9d2272605617
osd.1 up   in  weight 1 up_from 748 up_thru 748 down_at 745
last_clean_interval [20,744) 192.168.1.72:6800/4271 192.168.1.72:6801/4271
192.168.1.72:6802/4271 192.168.1.72:6803/4271 exists,up
930c097a-f68b-4f9c-a6a1-6787a1382a41

pg_temp 0.12 [1,0,3]
pg_temp 0.16 [1,0,3]
pg_temp 0.18 [1,0,3]
pg_temp 1.11 [1,0,3]
pg_temp 1.15 [1,0,3]
pg_temp 1.17 [1,0,3]

Slowdowns increase the load of my nginx servers to around 40, and access to
the CephFS mount is incredibly slow.  These slowdowns happen about once a
week.  I typically solve them by restarting the MDS.

When the cluster gets slow I see the following in my logs:

2013-10-21 14:33:54.079200 7f6301e10700  0 log [WRN] : slow request
30.281651 seconds old, received at 2013-10-21 14:33:23.797488:
osd_op(mds.0.8:16266 14094c4. [tmapup 0~0] 1.91102783 e750) v4
currently commit sent
013-10-21 14:33:54.079191 7f6301e10700  0 log [WRN] : 6 slow requests, 6
included below; oldest blocked for > 30.281651 secs

If this is the sole kind of slow request you see (tmapup reports),
then it looks like the MDS is flushing out directory updates and the
OSD is taking a long time to process them. I'm betting you have very
large directories and it's taking the OSD a while to process the
changes; and the MDS is getting backed up while it does so because
it's trying to flush them out of memory.


Any advice? Would increasing the PG num for data and metadata help? Would
moving the MDS to a host which does not also run an OSD be greatly
beneficial?

Your PG counts are probably fine for a cluster of that size, although
you could try bumping them up by 2x or something. More likely, though,
is that your CephFS install is not well-tuned for the directory sizes
you're using. What's the largest directory you're using? Have you
tried bumping up your mds cache size? (And what's the host memory
usage look like?)


I have lots of directories named like 2013_08 averaging about 50GB each, 
just filled with images.
We haven't tuned the mds cache size at all, and memory usage on the MDS 
server is generally very high.


Thank you, this seems to be a good starting point, and makes sense given 
our use case.


Kind regards,
Pieter Steyn
 -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

53 matches

Mail list logo