Re: [ceph-users] how configure cephfs to strip data across osd's?

2013-04-18 Thread George Shuklin

18.04.2013 10:49, Wolfgang Hennerbichler пишет:

Ceph doesn't support data stripes, and you probably also don't need it.
Ceph distributes reads of data anyways, because large objects are spread
automatically to the OSDs, reads happen concurrently, this is somehow
like striping, but better :)

Well... May be I saying something wrong, but for small cluster (one 
node, actually, 8 drives for OSDs) when I mount cephfs and checking FS 
performance, I see excellent read performance, but poor random write 
performance. I run test IO with 4k i blocks, so I thought that problem 
is default strip block size, but I couldn't find any documentation how 
to change it.


Just for reference: on 1G network with 8 OSD (8 HDD) I got over 1k IOPS 
on reading and just 30 IOPS on writing. And atop shows that OSD's disks 
is underutilized...

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to read file on Ceph FS

2013-04-18 Thread Li, Chen
Can you explain more?

Because I found here : 
http://docs.openstack.org/trunk/openstack-compute/admin/content/configuring-live-migrations.html
It says:  "Shared storage: NOVA-INST-DIR/instances/ (eg 
/var/lib/nova/instances) has to be mounted by shared storage."
And from here: 
http://www.mail-archive.com/ceph-users@lists.ceph.com/msg00241.html
Looks OpenStack still not support live-migrate when instance is boot from 
volume.

I want to integrate Ceph with nova compute, but I can't always boot instance 
from volume due to some other reason.

Also, I want to know if  your guys believe, actually, ceph user should not use 
Ceph FS at anywhere ?

Thanks.
-chen



-Original Message-
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wolfgang Hennerbichler
Sent: Thursday, April 18, 2013 2:34 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Unable to read file on Ceph FS

You're wrong. You can do Live Migration with RBD without doing block live 
migration. RBD images also live on any ceph (libvirt / kvm) - client, without 
the need to move physical data. Think of it as an iSCSI volume.
RBD is the way to go, you don't want or need cephFS here at all, unless you 
want to store qemu-img files in cephfs, which is a bad idea to start with.

The only thing I don't know if openstack supports base images on ceph, I know 
openstack volumes work flawlessly with RBD, I have one installation right here 
that works very well.

On 04/18/2013 06:54 AM, Li, Chen wrote:
> It is just a test environment.
> I'm using Ceph FS is only because I want to integrate Ceph with all OpenStack 
> environment.
> And a shared FS can let me do the live-migration but not just 
> block-live-migration.
> 
> I know Ceph FS is not production ready.
> So, the only suggestion is not to use it?
> Anymore?
> 
> 
> Thanks.
> -chen
> 
> 
> 
> -Original Message-
> From: ceph-users-boun...@lists.ceph.com 
> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido den 
> Hollander
> Sent: Wednesday, April 17, 2013 6:27 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Unable to read file on Ceph FS
> 
> Hi,
> 
> On 04/17/2013 11:01 AM, Li, Chen wrote:
>> Hi list,
>>
>> I'm working on ceph version 0.56.2.
>>
>> I have mount Ceph FS to two OpenStack compute Nodes for storing files 
>> and images for running instances.
>>
> 
> Why are you using CephFS for OpenStack? The recommendation is to use RBD as a 
> backend for OpenStack.
> 
> I recommend you read the docs regarding the OpenStack RBD installation: 
> http://eu.ceph.com/docs/master/rbd/rbd-openstack/
> 
> Wido
> 
>> While I have successfully start several instances in the cloud, and 
>> all instances are started at compute-1.
>>
>> Now I enter into compute-2, and try to get into the "_base" 
>> directory, command "ls" just hang there and never response.
>>
>> But if I do the same command at compute-1, everything looks just fine.
>>
>> Anyone know why this happen?
>>
>> What should I do next ?
>>
>> Thanks.
>>
>> -chen
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 
> --
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


--
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd over xfs slow performances

2013-04-18 Thread Emmanuel Lacour

Dear ceph users,


I just set up a small cluster with two osds and 3 mon.
(0.56.4-1~bpo70+1)

OSDs are xfs (defaults mkfs options, mounted defaults,noatime) over lvm over 
hwraid.

dd if=/dev/zero of=... bs=1M count=1 conv=fdatasync on each ceph-*
osd mounted partitions show 120MB/s on one server and 50MB/s on the
second one.

iperf between servers gives 580Mb/s

I created a rbd, mapped it and did the same dd on it (direct to
/dev/rbd/...).

I get only 15MB/s :(


(network interfaces shows ~ 120-150Mb/s, each server show ~30% IO wait)



Any hint to increase the performance so it's not so far from non-ceph
one?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Illustrations

2013-04-18 Thread Wolfgang Hennerbichler
Thanks, it's just the thing I was searching for.

On 04/17/2013 05:29 PM, Patrick McGarry wrote:
> Hey Wolfgang,
> 
> There are several slide decks with associated imagery floating around
> out there.  I'd be happy to get you images that correspond to what you
> want to focus on.  A good place to start is Josh's talk from last
> year's OpenStack Developer Summit:
> 
> http://www.slideshare.net/openstack/storing-vms-with-cinder-and-ceph-rbdpdf
> 
> That has most of the basic architecture imagery that we have been
> using.  Depending on what you need I can get you the raw files for
> that or expand our search criteria.  Let me know what works.  Thanks.
> 
> 
> Best Regards,
> 
> Patrick McGarry
> Director, Community || Inktank
> 
> http://ceph.com  ||  http://inktank.com
> @scuttlemonkey || @ceph || @inktank
> 
> 
> On Wed, Apr 17, 2013 at 4:50 AM, Wolfgang Hennerbichler
>  wrote:
>> Hi,
>>
>> I do have to present ceph in front of a bunch of students in the
>> following weeks. Are there any illustrations that you guys have that I
>> could re-use? Like beautiful pictures that explain the whole concept,
>> other than those in the documentation?
>>
>> Wolfgang
>>
>> --
>> DI (FH) Wolfgang Hennerbichler
>> Software Development
>> Unit Advanced Computing Technologies
>> RISC Software GmbH
>> A company of the Johannes Kepler University Linz
>>
>> IT-Center
>> Softwarepark 35
>> 4232 Hagenberg
>> Austria
>>
>> Phone: +43 7236 3343 245
>> Fax: +43 7236 3343 250
>> wolfgang.hennerbich...@risc-software.at
>> http://www.risc-software.at
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Mark Nelson

On 04/18/2013 05:19 AM, Emmanuel Lacour wrote:


Dear ceph users,


I just set up a small cluster with two osds and 3 mon.
(0.56.4-1~bpo70+1)

OSDs are xfs (defaults mkfs options, mounted defaults,noatime) over lvm over 
hwraid.

dd if=/dev/zero of=... bs=1M count=1 conv=fdatasync on each ceph-*
osd mounted partitions show 120MB/s on one server and 50MB/s on the
second one.


It makes me a bit nervous that you are seeing such a discrepancy between 
the drives.  Were you expecting that one server would be so much faster 
than the other?  If a drive is is starting to fail your results may be 
unpredictable.




iperf between servers gives 580Mb/s

I created a rbd, mapped it and did the same dd on it (direct to
/dev/rbd/...).

I get only 15MB/s :(


Are you doing replication?  If one server has a slower drive, doing 2x 
replication, and you are using XFS (which tends to have some performance 
overhead with ceph) that might get you down into this range given than 
50MB/s number you posted above.  You may try connecting to the OSD admin 
sockets during tests and poll to see if all of the outstanding 
operations are backing up on one OSD.


Sebastien has a nice little tutorial on how to use the admin socket here:

http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/






(network interfaces shows ~ 120-150Mb/s, each server show ~30% IO wait)



Any hint to increase the performance so it's not so far from non-ceph
one?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Format 2 Image support in the RBD driver

2013-04-18 Thread Whelan, Ryan
I've not been following the list for long, so forgive me if this has been 
covered, but is there a plan for image 2 support in the kernel RBD driver?  I 
assume with Linux 3.9 in the RC phase, its not likely to appear there?

Thanks!

NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Format 2 Image support in the RBD driver

2013-04-18 Thread Olivier B.
If I well understand the roadmap
( http://tracker.ceph.com/projects/ceph/roadmap ), it's planed for Ceph
v0.62B :


Le jeudi 18 avril 2013 à 09:28 -0400, Whelan, Ryan a écrit :
> I've not been following the list for long, so forgive me if this has been 
> covered, but is there a plan for image 2 support in the kernel RBD driver?  I 
> assume with Linux 3.9 in the RC phase, its not likely to appear there?
> 
> Thanks!
> 
> NOTICE: Protect the information in this message in accordance with the 
> company's security policies. If you received this message in error, 
> immediately notify the sender and destroy all copies.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Format 2 Image support in the RBD driver

2013-04-18 Thread Whelan, Ryan
Does this mean its in linux-next? (released in 3.10?)

- Original Message -
From: "Olivier B." 
To: "Ryan Whelan" 
Cc: ceph-users@lists.ceph.com
Sent: Thursday, April 18, 2013 9:36:22 AM
Subject: Re: [ceph-users] Format 2 Image support in the RBD driver

If I well understand the roadmap
( http://tracker.ceph.com/projects/ceph/roadmap ), it's planed for Ceph
v0.62B :


Le jeudi 18 avril 2013 à 09:28 -0400, Whelan, Ryan a écrit :
> I've not been following the list for long, so forgive me if this has been 
> covered, but is there a plan for image 2 support in the kernel RBD driver?  I 
> assume with Linux 3.9 in the RC phase, its not likely to appear there?
> 
> Thanks!
> 
> NOTICE: Protect the information in this message in accordance with the 
> company's security policies. If you received this message in error, 
> immediately notify the sender and destroy all copies.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Emmanuel Lacour
On Thu, Apr 18, 2013 at 08:25:50AM -0500, Mark Nelson wrote:
> 

thanks for your answer!

> It makes me a bit nervous that you are seeing such a discrepancy
> between the drives.  Were you expecting that one server would be so
> much faster than the other?  If a drive is is starting to fail your
> results may be unpredictable.
> 

the two servers are far from identical unfortunatly.

first server has two sas 15krpm drives in a RAID 1 (PERC 5/i)
second  has two sata 7.2krpm dives in a RAID 1 (aacraid CERC)

> 
> Are you doing replication? 

yes, as I use default replication which is 2 by default.

> If one server has a slower drive, doing
> 2x replication, and you are using XFS (which tends to have some
> performance overhead with ceph) that might get you down into this
> range given than 50MB/s number you posted above.

I don't understand why I can only send datas at 15MB/s when it should be
written to two devices that can do 50MB/s :(

Can you explain me a bit more on this or point me to some design doc?

Xfs is the recommended FS for the kernel used (3.2.0). And btrfs is
still experimental :-/


> You may try
> connecting to the OSD admin sockets during tests and poll to see if
> all of the outstanding operations are backing up on one OSD.
> 
> Sebastien has a nice little tutorial on how to use the admin socket here:
> 
> http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/
> 

thanks, I'm going to look at this ...

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Dan van der Ster
Hi,

tl;dr: something deleted the objects from the .rgw.gc and then the pgs
went inconsistent. Is this normal??!!

Just now we had scrub errors and resulting inconsistencies on many of
the pgs belonging to our .rgw.gc pool.

HEALTH_ERR 119 pgs inconsistent; 119 scrub errors
pg 11.1f0 is active+clean+inconsistent, acting [35,28,4]
pg 11.1f8 is active+clean+inconsistent, acting [35,28,4]
pg 11.1fb is active+clean+inconsistent, acting [11,34,38]
pg 11.1e0 is active+clean+inconsistent, acting [35,28,4]
pg 11.1e3 is active+clean+inconsistent, acting [11,34,38]
…

[root@ceph-mon1 ~]# ceph osd lspools
0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12
.rgw.control,13 .users.uid,14 .users.email,15 .users,16
.rgw.buckets,17 .usage,


On the relevant hosts, I checked what was in those directories:

[root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
total 20
drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..

They were all empty like that. I checked the log files:

2013-04-18 14:53:56.532054 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub
stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
2013-04-18 14:53:56.532065 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub 1 errors
2013-04-18 14:53:59.532401 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub
stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
2013-04-18 14:53:59.532411 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub 1 errors
2013-04-18 14:54:01.532602 7fe5457fb700  0 log [ERR] : 11.10
deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
2013-04-18 14:54:01.532614 7fe5457fb700  0 log [ERR] : 11.10 deep-scrub 1 errors
2013-04-18 14:54:02.532839 7fe5457fb700  0 log [ERR] : 11.18
deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
2013-04-18 14:54:02.532848 7fe5457fb700  0 log [ERR] : 11.18 deep-scrub 1 errors
…
2013-04-18 14:57:14.554431 7fe5457fb700  0 log [ERR] : 11.1f0
deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
2013-04-18 14:57:14.554438 7fe5457fb700  0 log [ERR] : 11.1f0
deep-scrub 1 errors

So it looks like something deleted all the objects from those pg directories.
Next I tried a repair:

[root@ceph-mon1 ~]# ceph pg repair 11.1f0
instructing pg 11.1f0 on osd.35 to repair
[root@ceph-mon1 ~]# ceph -w
…
2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch,
got 0/3 objects, 0/0 clones, 0/0 bytes.
2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed
[root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0
instructing pg 11.1f0 on osd.35 to deep-scrub
[root@ceph-mon1 ~]# ceph -w
…
2013-04-18 15:20:21.769446 mon.0 [INF] pgmap v31714: 3808 pgs: 3690
active+clean, 118 active+clean+inconsistent; 73284 MB data, 276 GB
used, 44389 GB / 44665 GB avail
2013-04-18 15:20:17.677058 osd.35 [INF] 11.1f0 deep-scrub ok

So indeed the repair "fixed" the problem (now there are only 118
inconsistent pgs, down from 119). And note that there is still nothing
in the directory for that pg, as expected:

[root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
total 20
drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..


So my question is: can anyone explain what happened here? It seems
that something deleted the objects from the .rgw.gc pool (as one would
expect) but the pgs were left inconsistent afterwards.

Best Regards,
Dan van der Ster
CERN IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Mark Nelson

On 04/18/2013 08:42 AM, Emmanuel Lacour wrote:

On Thu, Apr 18, 2013 at 08:25:50AM -0500, Mark Nelson wrote:




thanks for your answer!


It makes me a bit nervous that you are seeing such a discrepancy
between the drives.  Were you expecting that one server would be so
much faster than the other?  If a drive is is starting to fail your
results may be unpredictable.



the two servers are far from identical unfortunatly.

first server has two sas 15krpm drives in a RAID 1 (PERC 5/i)
second  has two sata 7.2krpm dives in a RAID 1 (aacraid CERC)



Are you doing replication?


yes, as I use default replication which is 2 by default.


If one server has a slower drive, doing
2x replication, and you are using XFS (which tends to have some
performance overhead with ceph) that might get you down into this
range given than 50MB/s number you posted above.


I don't understand why I can only send datas at 15MB/s when it should be
written to two devices that can do 50MB/s :(


So Ceph pseudo-randomly distributes data to different OSDs, which means 
that you are more or less limited by the slowest OSD in your system.  IE 
if one node can only process X objects per second, outstanding 
operations will slowly back up on it until you max out the number of 
outstanding operations that are allowed and the other OSDs get starved 
while the slow one tries to catch up.


So lets say 50MB/s per device to match your slow one.

1) If you put your journals on the same devices, you are doing 2 writes 
for every incoming write since we do full data journalling.  Assuming 
that's the case we are down to 25MB/s.


2) Now, are you writing to a pool that has 2X replication?  If so, you 
are writing out an object to both devices for every write, but also 
incurring extra latency because the primary OSD will wait until it has 
replicated a write to the secondary OSD before it can acknowledge to the 
client.  With replication of 2 and 2 servers, that means that our 
aggregate throughput at best can only be 25MB/s if each server can only 
individually do 25MB/s.  In reality because of the extra overhead 
involved, it will probably be less.


3) Now we must also account for the extra overhead that XFS causes.  We 
suggest XFS because it's stable, but especially on ceph prior to version 
0.58, it's not typically as fast as BTRFS/EXT4.  Some things that might 
help are using noatime and inode64, making sure you are describing your 
RAID array to XFS, and make sure your partitions are properly aligned 
for the RAID.  One other suggestion:  If your controllers have WB cache, 
enabling it can really help in some cases.




Can you explain me a bit more on this or point me to some design doc?

Xfs is the recommended FS for the kernel used (3.2.0). And btrfs is
still experimental :-/



You may try
connecting to the OSD admin sockets during tests and poll to see if
all of the outstanding operations are backing up on one OSD.

Sebastien has a nice little tutorial on how to use the admin socket here:

http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/



thanks, I'm going to look at this ...

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Dan van der Ster
Replying to myself...
I just noticed this:

[root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/
total 27G
-rw-r--r--. 1 root root 27G Apr 18 16:08 radosgw.log
-rw-r--r--. 1 root root  20 Apr  5 03:13 radosgw.log-20130405.gz
-rw-r--r--. 1 root root  20 Apr  6 03:14 radosgw.log-20130406.gz
-rw-r--r--. 1 root root  20 Apr  7 03:50 radosgw.log-20130407.gz
-rw-r--r--. 1 root root  20 Apr  8 03:29 radosgw.log-20130408.gz
-rw-r--r--. 1 root root  20 Apr  9 03:19 radosgw.log-20130409.gz
-rw-r--r--. 1 root root  20 Apr 10 03:15 radosgw.log-20130410.gz

-rw-r--r--. 1 root root 0 Apr 11 03:48 radosgw.log-20130411

[root@ceph-radosgw01 ceph]# df -h .
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg1-root   37G   37G 0 100% /


The radosgw log filled up the disk. Perhaps this caused the problem..

Cheers, Dan
CERN IT

On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster  wrote:
> Hi,
>
> tl;dr: something deleted the objects from the .rgw.gc and then the pgs
> went inconsistent. Is this normal??!!
>
> Just now we had scrub errors and resulting inconsistencies on many of
> the pgs belonging to our .rgw.gc pool.
>
> HEALTH_ERR 119 pgs inconsistent; 119 scrub errors
> pg 11.1f0 is active+clean+inconsistent, acting [35,28,4]
> pg 11.1f8 is active+clean+inconsistent, acting [35,28,4]
> pg 11.1fb is active+clean+inconsistent, acting [11,34,38]
> pg 11.1e0 is active+clean+inconsistent, acting [35,28,4]
> pg 11.1e3 is active+clean+inconsistent, acting [11,34,38]
> …
>
> [root@ceph-mon1 ~]# ceph osd lspools
> 0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12
> .rgw.control,13 .users.uid,14 .users.email,15 .users,16
> .rgw.buckets,17 .usage,
>
>
> On the relevant hosts, I checked what was in those directories:
>
> [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
> total 20
> drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..
>
> They were all empty like that. I checked the log files:
>
> 2013-04-18 14:53:56.532054 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub
> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> 2013-04-18 14:53:56.532065 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub 1 
> errors
> 2013-04-18 14:53:59.532401 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub
> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> 2013-04-18 14:53:59.532411 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub 1 
> errors
> 2013-04-18 14:54:01.532602 7fe5457fb700  0 log [ERR] : 11.10
> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> 2013-04-18 14:54:01.532614 7fe5457fb700  0 log [ERR] : 11.10 deep-scrub 1 
> errors
> 2013-04-18 14:54:02.532839 7fe5457fb700  0 log [ERR] : 11.18
> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> 2013-04-18 14:54:02.532848 7fe5457fb700  0 log [ERR] : 11.18 deep-scrub 1 
> errors
> …
> 2013-04-18 14:57:14.554431 7fe5457fb700  0 log [ERR] : 11.1f0
> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> 2013-04-18 14:57:14.554438 7fe5457fb700  0 log [ERR] : 11.1f0
> deep-scrub 1 errors
>
> So it looks like something deleted all the objects from those pg directories.
> Next I tried a repair:
>
> [root@ceph-mon1 ~]# ceph pg repair 11.1f0
> instructing pg 11.1f0 on osd.35 to repair
> [root@ceph-mon1 ~]# ceph -w
> …
> 2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch,
> got 0/3 objects, 0/0 clones, 0/0 bytes.
> 2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed
> [root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0
> instructing pg 11.1f0 on osd.35 to deep-scrub
> [root@ceph-mon1 ~]# ceph -w
> …
> 2013-04-18 15:20:21.769446 mon.0 [INF] pgmap v31714: 3808 pgs: 3690
> active+clean, 118 active+clean+inconsistent; 73284 MB data, 276 GB
> used, 44389 GB / 44665 GB avail
> 2013-04-18 15:20:17.677058 osd.35 [INF] 11.1f0 deep-scrub ok
>
> So indeed the repair "fixed" the problem (now there are only 118
> inconsistent pgs, down from 119). And note that there is still nothing
> in the directory for that pg, as expected:
>
> [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
> total 20
> drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..
>
>
> So my question is: can anyone explain what happened here? It seems
> that something deleted the objects from the .rgw.gc pool (as one would
> expect) but the pgs were left inconsistent afterwards.
>
> Best Regards,
> Dan van der Ster
> CERN IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph configure RAM for each daemon instance ?

2013-04-18 Thread konradwro
Hello, it is possible to configure ceph.conf RAM for each daemon instance ?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Dan van der Ster
Sorry for the noise.. we now have a better idea what happened here.

For those that might care, basically we had one client looping while
trying to list the / bucket with an incorrect key. rgw was handling
this at 1kHz, so congratulations on that. I will now go and read how
to either decrease the log level or increase the log rotate frequency.

Thanks again,
Dan
CERN IT

On Thu, Apr 18, 2013 at 4:09 PM, Dan van der Ster  wrote:
> Replying to myself...
> I just noticed this:
>
> [root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/
> total 27G
> -rw-r--r--. 1 root root 27G Apr 18 16:08 radosgw.log
> -rw-r--r--. 1 root root  20 Apr  5 03:13 radosgw.log-20130405.gz
> -rw-r--r--. 1 root root  20 Apr  6 03:14 radosgw.log-20130406.gz
> -rw-r--r--. 1 root root  20 Apr  7 03:50 radosgw.log-20130407.gz
> -rw-r--r--. 1 root root  20 Apr  8 03:29 radosgw.log-20130408.gz
> -rw-r--r--. 1 root root  20 Apr  9 03:19 radosgw.log-20130409.gz
> -rw-r--r--. 1 root root  20 Apr 10 03:15 radosgw.log-20130410.gz
>
> -rw-r--r--. 1 root root 0 Apr 11 03:48 radosgw.log-20130411
>
> [root@ceph-radosgw01 ceph]# df -h .
> FilesystemSize  Used Avail Use% Mounted on
> /dev/mapper/vg1-root   37G   37G 0 100% /
>
>
> The radosgw log filled up the disk. Perhaps this caused the problem..
>
> Cheers, Dan
> CERN IT
>
> On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster  wrote:
>> Hi,
>>
>> tl;dr: something deleted the objects from the .rgw.gc and then the pgs
>> went inconsistent. Is this normal??!!
>>
>> Just now we had scrub errors and resulting inconsistencies on many of
>> the pgs belonging to our .rgw.gc pool.
>>
>> HEALTH_ERR 119 pgs inconsistent; 119 scrub errors
>> pg 11.1f0 is active+clean+inconsistent, acting [35,28,4]
>> pg 11.1f8 is active+clean+inconsistent, acting [35,28,4]
>> pg 11.1fb is active+clean+inconsistent, acting [11,34,38]
>> pg 11.1e0 is active+clean+inconsistent, acting [35,28,4]
>> pg 11.1e3 is active+clean+inconsistent, acting [11,34,38]
>> …
>>
>> [root@ceph-mon1 ~]# ceph osd lspools
>> 0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12
>> .rgw.control,13 .users.uid,14 .users.email,15 .users,16
>> .rgw.buckets,17 .usage,
>>
>>
>> On the relevant hosts, I checked what was in those directories:
>>
>> [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
>> total 20
>> drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
>> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..
>>
>> They were all empty like that. I checked the log files:
>>
>> 2013-04-18 14:53:56.532054 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub
>> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:53:56.532065 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub 1 
>> errors
>> 2013-04-18 14:53:59.532401 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub
>> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:53:59.532411 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub 1 
>> errors
>> 2013-04-18 14:54:01.532602 7fe5457fb700  0 log [ERR] : 11.10
>> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:54:01.532614 7fe5457fb700  0 log [ERR] : 11.10 deep-scrub 1 
>> errors
>> 2013-04-18 14:54:02.532839 7fe5457fb700  0 log [ERR] : 11.18
>> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:54:02.532848 7fe5457fb700  0 log [ERR] : 11.18 deep-scrub 1 
>> errors
>> …
>> 2013-04-18 14:57:14.554431 7fe5457fb700  0 log [ERR] : 11.1f0
>> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:57:14.554438 7fe5457fb700  0 log [ERR] : 11.1f0
>> deep-scrub 1 errors
>>
>> So it looks like something deleted all the objects from those pg directories.
>> Next I tried a repair:
>>
>> [root@ceph-mon1 ~]# ceph pg repair 11.1f0
>> instructing pg 11.1f0 on osd.35 to repair
>> [root@ceph-mon1 ~]# ceph -w
>> …
>> 2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch,
>> got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed
>> [root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0
>> instructing pg 11.1f0 on osd.35 to deep-scrub
>> [root@ceph-mon1 ~]# ceph -w
>> …
>> 2013-04-18 15:20:21.769446 mon.0 [INF] pgmap v31714: 3808 pgs: 3690
>> active+clean, 118 active+clean+inconsistent; 73284 MB data, 276 GB
>> used, 44389 GB / 44665 GB avail
>> 2013-04-18 15:20:17.677058 osd.35 [INF] 11.1f0 deep-scrub ok
>>
>> So indeed the repair "fixed" the problem (now there are only 118
>> inconsistent pgs, down from 119). And note that there is still nothing
>> in the directory for that pg, as expected:
>>
>> [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
>> total 20
>> drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
>> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..
>>
>>
>> So my question is: can anyone explain what happened here? It seems
>> that something deleted the objects from the .rgw.gc pool (as one would
>> expect) but the pgs were left

Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Emmanuel Lacour
On Thu, Apr 18, 2013 at 09:05:12AM -0500, Mark Nelson wrote:
> 
> So Ceph pseudo-randomly distributes data to different OSDs, which
> means that you are more or less limited by the slowest OSD in your
> system.  IE if one node can only process X objects per second,
> outstanding operations will slowly back up on it until you max out
> the number of outstanding operations that are allowed and the other
> OSDs get starved while the slow one tries to catch up.
> 
> So lets say 50MB/s per device to match your slow one.
> 
> 1) If you put your journals on the same devices, you are doing 2
> writes for every incoming write since we do full data journalling.
> Assuming that's the case we are down to 25MB/s.
> 

I increased the flush intervals so it's nearly 30 seconds and disabled
the filestore flusher, now I'm close to those 25MB/s if I do not exceed
30seconds of writing.

For longer time, I get ~ 15MB/s.

> 2) Now, are you writing to a pool that has 2X replication?  If so,
> you are writing out an object to both devices for every write, but
> also incurring extra latency because the primary OSD will wait until
> it has replicated a write to the secondary OSD before it can
> acknowledge to the client.  With replication of 2 and 2 servers,
> that means that our aggregate throughput at best can only be 25MB/s
> if each server can only individually do 25MB/s.  In reality because
> of the extra overhead involved, it will probably be less.
> 

is there some parameter to make this synchronisation "asynchrone", i.e.
send the ack if it has reached the other server buffer, not the other
server disk?

(I understand of course the risk of loosing data in this circumstance)

> 3) Now we must also account for the extra overhead that XFS causes.
> We suggest XFS because it's stable, but especially on ceph prior to
> version 0.58, it's not typically as fast as BTRFS/EXT4. 

Well, as all my servers are using ext4, I'll give it a try but I don't
expect to gain a lot of percents in performances ;)

> Some things
> that might help are using noatime and inode64, making sure you are

Yes I do use those options.

> describing your RAID array to XFS, and make sure your partitions are
> properly aligned for the RAID. 

Well I don't know how to do/check this, but will try to find the way to
do this ;)

> One other suggestion:  If your
> controllers have WB cache, enabling it can really help in some
> cases.
> 

of course, this is the first thing I check on a server ;)

thank you very much for those explanations!


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Emmanuel Lacour
On Thu, Apr 18, 2013 at 04:19:09PM +0200, Emmanuel Lacour wrote:
> > 1) If you put your journals on the same devices, you are doing 2
> > writes for every incoming write since we do full data journalling.
> > Assuming that's the case we are down to 25MB/s.
> > 
> 


to reduce this double write overhead, I tried to put the journal on
tmpfs and I got far better performances, nearly equal to the slower
disk. So with low count of osd and low performance osd data drives, the
only way to go seems to put the journal on ssd.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Mark Nelson

On 04/18/2013 10:12 AM, Emmanuel Lacour wrote:

On Thu, Apr 18, 2013 at 04:19:09PM +0200, Emmanuel Lacour wrote:

1) If you put your journals on the same devices, you are doing 2
writes for every incoming write since we do full data journalling.
Assuming that's the case we are down to 25MB/s.






to reduce this double write overhead, I tried to put the journal on
tmpfs and I got far better performances, nearly equal to the slower
disk. So with low count of osd and low performance osd data drives, the
only way to go seems to put the journal on ssd.


SSD journals definitely help, especially when doing large writes and 
targeting high throughput.


If you get a chance, it still may be worth giving 0.60 a try and seeing 
if helps at all.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Health problem .. how to fix ?

2013-04-18 Thread Stephane Boisvert

Hi,

  I configured a test 'cluster' and did play with it (moving osd 
folders around ie. journal file) and did break something. Now I think 
that this can occurs again  when we go prod. so I would like to know how 
I can fix it.. I don't care about loosing my files..


Anyone can help? here's the logs


HEALTH_WARN 15 pgs degraded; 1 pgs recovering; 15 pgs stale; 15 pgs 
stuck stale; 16 pgs stuck unclean; recovery 3/180 degraded (1.667%); 
1/60 unfound (1.667%)
pg 5.63 is stuck unclean for 157741.457285, current state 
stale+active+degraded, last acting [2]
pg 4.66 is stuck unclean for 77312.285409, current state 
stale+active+degraded, last acting [2]
pg 4.64 is stuck unclean for 157741.034570, current state 
stale+active+degraded, last acting [2]
pg 5.65 is stuck unclean for 77312.285382, current state 
stale+active+degraded, last acting [2]
pg 4.49 is stuck unclean for 77312.285021, current state 
stale+active+degraded, last acting [2]
pg 5.48 is stuck unclean for 77312.285058, current state 
stale+active+degraded, last acting [2]
pg 1.26 is stuck unclean for 77362.971821, current state 
active+recovering, last acting [5,2,1]
pg 2.10 is stuck unclean for 157740.553908, current state 
stale+active+degraded, last acting [2]
pg 4.e is stuck unclean for 157740.355222, current state 
stale+active+degraded, last acting [2]
pg 5.d is stuck unclean for 157740.354260, current state 
stale+active+degraded, last acting [2]
pg 5.0 is stuck unclean for 77312.264545, current state 
stale+active+degraded, last acting [2]
pg 4.1 is stuck unclean for 77312.264416, current state 
stale+active+degraded, last acting [2]
pg 3.2 is stuck unclean for 77312.263108, current state 
stale+active+degraded, last acting [2]
pg 2.3 is stuck unclean for 77312.263026, current state 
stale+active+degraded, last acting [2]
pg 4.71 is stuck unclean for 157740.352440, current state 
stale+active+degraded, last acting [2]
pg 5.70 is stuck unclean for 157740.352547, current state 
stale+active+degraded, last acting [2]
pg 5.63 is stuck stale for 77085.263183, current state 
stale+active+degraded, last acting [2]
pg 4.66 is stuck stale for 77085.263186, current state 
stale+active+degraded, last acting [2]
pg 4.64 is stuck stale for 77085.263187, current state 
stale+active+degraded, last acting [2]
pg 5.65 is stuck stale for 77085.263191, current state 
stale+active+degraded, last acting [2]
pg 4.49 is stuck stale for 77085.263186, current state 
stale+active+degraded, last acting [2]
pg 5.48 is stuck stale for 77085.263191, current state 
stale+active+degraded, last acting [2]
pg 2.10 is stuck stale for 77085.263258, current state 
stale+active+degraded, last acting [2]
pg 4.e is stuck stale for 77085.263247, current state 
stale+active+degraded, last acting [2]
pg 5.d is stuck stale for 77085.263245, current state 
stale+active+degraded, last acting [2]
pg 5.0 is stuck stale for 77085.263241, current state 
stale+active+degraded, last acting [2]
pg 4.1 is stuck stale for 77085.263245, current state 
stale+active+degraded, last acting [2]
pg 3.2 is stuck stale for 77085.263242, current state 
stale+active+degraded, last acting [2]
pg 2.3 is stuck stale for 77085.263247, current state 
stale+active+degraded, last acting [2]
pg 4.71 is stuck stale for 77085.263239, current state 
stale+active+degraded, last acting [2]
pg 5.70 is stuck stale for 77085.263245, current state 
stale+active+degraded, last acting [2]

pg 1.26 is active+recovering, acting [5,2,1], 1 unfound
recovery 3/180 degraded (1.667%); 1/60 unfound (1.667%)



Thanks


--







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Emmanuel Lacour
On Thu, Apr 18, 2013 at 10:18:29AM -0500, Mark Nelson wrote:
> 
> SSD journals definitely help, especially when doing large writes and
> targeting high throughput.
> 

clusters I will build will be used mainly for kvm servers images ;)

> If you get a chance, it still may be worth giving 0.60 a try and
> seeing if helps at all.
> 


I could not see any information about major changes in 0.60 that may
improve performances, do you have any pointer for this?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Format 2 Image support in the RBD driver

2013-04-18 Thread Gregory Farnum
I believe Alex just merged format 2 reading into our testing branch, and is
working on writes now.
-Greg

On Thursday, April 18, 2013, Whelan, Ryan wrote:

> Does this mean its in linux-next? (released in 3.10?)
>
> - Original Message -
> From: "Olivier B." >
> To: "Ryan Whelan" >
> Cc: ceph-users@lists.ceph.com 
> Sent: Thursday, April 18, 2013 9:36:22 AM
> Subject: Re: [ceph-users] Format 2 Image support in the RBD driver
>
> If I well understand the roadmap
> ( http://tracker.ceph.com/projects/ceph/roadmap ), it's planed for Ceph
> v0.62B :
>
>
> Le jeudi 18 avril 2013 à 09:28 -0400, Whelan, Ryan a écrit :
> > I've not been following the list for long, so forgive me if this has
> been covered, but is there a plan for image 2 support in the kernel RBD
> driver?  I assume with Linux 3.9 in the RC phase, its not likely to appear
> there?
> >
> > Thanks!
> >
> > NOTICE: Protect the information in this message in accordance with the
> company's security policies. If you received this message in error,
> immediately notify the sender and destroy all copies.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> NOTICE: Protect the information in this message in accordance with the
> company's security policies. If you received this message in error,
> immediately notify the sender and destroy all copies.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd over xfs slow performances

2013-04-18 Thread Mark Nelson

On 04/18/2013 10:29 AM, Emmanuel Lacour wrote:

On Thu, Apr 18, 2013 at 10:18:29AM -0500, Mark Nelson wrote:


SSD journals definitely help, especially when doing large writes and
targeting high throughput.



clusters I will build will be used mainly for kvm servers images ;)


If you get a chance, it still may be worth giving 0.60 a try and
seeing if helps at all.



I don't remember all of the changes, but back around 0.58 we changed how 
pg_info updates work which really improved small IO performance but did 
provide a bit of a boost for large IOs too.  Not sure if it will make as 
much of a different with EXT4, but it may be worth a shot if you haven't 
deployed a bunch of data yet.





I could not see any information about major changes in 0.60 that may
improve performances, do you have any pointer for this?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Gregory Farnum
What version was this on?
-Greg

On Thursday, April 18, 2013, Dan van der Ster wrote:

> Sorry for the noise.. we now have a better idea what happened here.
>
> For those that might care, basically we had one client looping while
> trying to list the / bucket with an incorrect key. rgw was handling
> this at 1kHz, so congratulations on that. I will now go and read how
> to either decrease the log level or increase the log rotate frequency.
>
> Thanks again,
> Dan
> CERN IT
>
> On Thu, Apr 18, 2013 at 4:09 PM, Dan van der Ster 
> wrote:
> > Replying to myself...
> > I just noticed this:
> >
> > [root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/
> > total 27G
> > -rw-r--r--. 1 root root 27G Apr 18 16:08 radosgw.log
> > -rw-r--r--. 1 root root  20 Apr  5 03:13 radosgw.log-20130405.gz
> > -rw-r--r--. 1 root root  20 Apr  6 03:14 radosgw.log-20130406.gz
> > -rw-r--r--. 1 root root  20 Apr  7 03:50 radosgw.log-20130407.gz
> > -rw-r--r--. 1 root root  20 Apr  8 03:29 radosgw.log-20130408.gz
> > -rw-r--r--. 1 root root  20 Apr  9 03:19 radosgw.log-20130409.gz
> > -rw-r--r--. 1 root root  20 Apr 10 03:15 radosgw.log-20130410.gz
> >
> > -rw-r--r--. 1 root root 0 Apr 11 03:48 radosgw.log-20130411
> >
> > [root@ceph-radosgw01 ceph]# df -h .
> > FilesystemSize  Used Avail Use% Mounted on
> > /dev/mapper/vg1-root   37G   37G 0 100% /
> >
> >
> > The radosgw log filled up the disk. Perhaps this caused the problem..
> >
> > Cheers, Dan
> > CERN IT
> >
> > On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster 
> wrote:
> >> Hi,
> >>
> >> tl;dr: something deleted the objects from the .rgw.gc and then the pgs
> >> went inconsistent. Is this normal??!!
> >>
> >> Just now we had scrub errors and resulting inconsistencies on many of
> >> the pgs belonging to our .rgw.gc pool.
> >>
> >> HEALTH_ERR 119 pgs inconsistent; 119 scrub errors
> >> pg 11.1f0 is active+clean+inconsistent, acting [35,28,4]
> >> pg 11.1f8 is active+clean+inconsistent, acting [35,28,4]
> >> pg 11.1fb is active+clean+inconsistent, acting [11,34,38]
> >> pg 11.1e0 is active+clean+inconsistent, acting [35,28,4]
> >> pg 11.1e3 is active+clean+inconsistent, acting [11,34,38]
> >> …
> >>
> >> [root@ceph-mon1 ~]# ceph osd lspools
> >> 0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12
> >> .rgw.control,13 .users.uid,14 .users.email,15 .users,16
> >> .rgw.buckets,17 .usage,
> >>
> >>
> >> On the relevant hosts, I checked what was in those directories:
> >>
> >> [root@lxfsrc4906 ~]# ls -l
> //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
> >> total 20
> >> drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
> >> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..
> >>
> >> They were all empty like that. I checked the log files:
> >>
> >> 2013-04-18 14:53:56.532054 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub
> >> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> >> 2013-04-18 14:53:56.532065 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub
> 1 errors
> >> 2013-04-18 14:53:59.532401 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub
> >> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> >> 2013-04-18 14:53:59.532411 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub
> 1 errors
> >> 2013-04-18 14:54:01.532602 7fe5457fb700  0 log [ERR] : 11.10
> >> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> >> 2013-04-18 14:54:01.532614 7fe5457fb700  0 log [ERR] : 11.10 deep-scrub
> 1 errors
> >> 2013-04-18 14:54:02.532839 7fe5457fb700  0 log [ERR] : 11.18
> >> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> >> 2013-04-18 14:54:02.532848 7fe5457fb700  0 log [ERR] : 11.18 deep-scrub
> 1 errors
> >> …
> >> 2013-04-18 14:57:14.554431 7fe5457fb700  0 log [ERR] : 11.1f0
> >> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
> >> 2013-04-18 14:57:14.554438 7fe5457fb700  0 log [ERR] : 11.1f0
> >> deep-scrub 1 errors
> >>
> >> So it looks like something deleted all the objects from those pg
> directories.
> >> Next I tried a repair:
> >>
> >> [root@ceph-mon1 ~]# ceph pg repair 11.1f0
> >> instructing pg 11.1f0 on osd.35 to repair
> >> [root@ceph-mon1 ~]# ceph -w
> >> …
> >> 2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch,
> >> got 0/3 objects, 0/0 clones, 0/0 bytes.
> >> 2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed
> >> [root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0
> >> instructing pg 11.1f0 on osd.35 to deep-sc



-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] has anyone successfully installed ceph with the crowbar

2013-04-18 Thread Makkelie, R - SPLXL
Hi,

Has anyone successfully installed Ceph using the ceph-barclamp with crowbar.
if yes what version are you using and how did you created the barclamp
and did you integrated it with Openstack folsom/Grizzly?

GreetZ
Ramonskie 

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Arne Wiebalck

This is 0.56.4 on a RHEL6 derivative.

Cheers,
 Arne


From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on 
behalf of Gregory Farnum [g...@inktank.com]
Sent: 18 April 2013 17:34
To: Dan van der Ster
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

What version was this on?
-Greg

On Thursday, April 18, 2013, Dan van der Ster wrote:
Sorry for the noise.. we now have a better idea what happened here.

For those that might care, basically we had one client looping while
trying to list the / bucket with an incorrect key. rgw was handling
this at 1kHz, so congratulations on that. I will now go and read how
to either decrease the log level or increase the log rotate frequency.

Thanks again,
Dan
CERN IT

On Thu, Apr 18, 2013 at 4:09 PM, Dan van der Ster  wrote:
> Replying to myself...
> I just noticed this:
>
> [root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/
> total 27G
> -rw-r--r--. 1 root root 27G Apr 18 16:08 radosgw.log
> -rw-r--r--. 1 root root  20 Apr  5 03:13 radosgw.log-20130405.gz
> -rw-r--r--. 1 root root  20 Apr  6 03:14 radosgw.log-20130406.gz
> -rw-r--r--. 1 root root  20 Apr  7 03:50 radosgw.log-20130407.gz
> -rw-r--r--. 1 root root  20 Apr  8 03:29 radosgw.log-20130408.gz
> -rw-r--r--. 1 root root  20 Apr  9 03:19 radosgw.log-20130409.gz
> -rw-r--r--. 1 root root  20 Apr 10 03:15 radosgw.log-20130410.gz
>
> -rw-r--r--. 1 root root 0 Apr 11 03:48 radosgw.log-20130411
>
> [root@ceph-radosgw01 ceph]# df -h .
> FilesystemSize  Used Avail Use% Mounted on
> /dev/mapper/vg1-root   37G   37G 0 100% /
>
>
> The radosgw log filled up the disk. Perhaps this caused the problem..
>
> Cheers, Dan
> CERN IT
>
> On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster  wrote:
>> Hi,
>>
>> tl;dr: something deleted the objects from the .rgw.gc and then the pgs
>> went inconsistent. Is this normal??!!
>>
>> Just now we had scrub errors and resulting inconsistencies on many of
>> the pgs belonging to our .rgw.gc pool.
>>
>> HEALTH_ERR 119 pgs inconsistent; 119 scrub errors
>> pg 11.1f0 is active+clean+inconsistent, acting [35,28,4]
>> pg 11.1f8 is active+clean+inconsistent, acting [35,28,4]
>> pg 11.1fb is active+clean+inconsistent, acting [11,34,38]
>> pg 11.1e0 is active+clean+inconsistent, acting [35,28,4]
>> pg 11.1e3 is active+clean+inconsistent, acting [11,34,38]
>> …
>>
>> [root@ceph-mon1 ~]# ceph osd lspools
>> 0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12
>> .rgw.control,13 .users.uid,14 .users.email,15 .users,16
>> .rgw.buckets,17 .usage,
>>
>>
>> On the relevant hosts, I checked what was in those directories:
>>
>> [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a
>> total 20
>> drwxr-xr-x.   2 root root 6 Apr 16 10:48 .
>> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 ..
>>
>> They were all empty like that. I checked the log files:
>>
>> 2013-04-18 14:53:56.532054 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub
>> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:53:56.532065 7fe5457fb700  0 log [ERR] : 11.0 deep-scrub 1 
>> errors
>> 2013-04-18 14:53:59.532401 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub
>> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:53:59.532411 7fe5457fb700  0 log [ERR] : 11.8 deep-scrub 1 
>> errors
>> 2013-04-18 14:54:01.532602 7fe5457fb700  0 log [ERR] : 11.10
>> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:54:01.532614 7fe5457fb700  0 log [ERR] : 11.10 deep-scrub 1 
>> errors
>> 2013-04-18 14:54:02.532839 7fe5457fb700  0 log [ERR] : 11.18
>> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:54:02.532848 7fe5457fb700  0 log [ERR] : 11.18 deep-scrub 1 
>> errors
>> …
>> 2013-04-18 14:57:14.554431 7fe5457fb700  0 log [ERR] : 11.1f0
>> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 14:57:14.554438 7fe5457fb700  0 log [ERR] : 11.1f0
>> deep-scrub 1 errors
>>
>> So it looks like something deleted all the objects from those pg directories.
>> Next I tried a repair:
>>
>> [root@ceph-mon1 ~]# ceph pg repair 11.1f0
>> instructing pg 11.1f0 on osd.35 to repair
>> [root@ceph-mon1 ~]# ceph -w
>> …
>> 2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch,
>> got 0/3 objects, 0/0 clones, 0/0 bytes.
>> 2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed
>> [root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0
>> instructing pg 11.1f0 on osd.35 to deep-sc


--
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] has anyone successfully installed ceph with the crowbar

2013-04-18 Thread Gregory Farnum
The barclamps were written against the crowbar "Betty" release, OpenStack
Essex (which is the last one supported by Crowbar), and Ceph "argonaut". JJ
has updated them to use "Bobtail", but I don't think anybody's run them
against newer versions of Openstack. :(
You should be able to find built versions of these on the Inktank or Ceph
websites, though I don't remember where exactly.

What are you trying to do, precisely?
-Greg

On Thursday, April 18, 2013, Makkelie, R - SPLXL wrote:

> **
> Hi,
>
> Has anyone successfully installed Ceph using the ceph-barclamp with
> crowbar.
> if yes what version are you using and how did you created the barclamp
> and did you integrated it with Openstack folsom/Grizzly?
>
> GreetZ
> Ramonskie 
> For information, services and offers, please visit our web site:
> http://www.klm.com. This e-mail and any attachment may contain
> confidential and privileged material intended for the addressee only. If
> you are not the addressee, you are notified that no part of the e-mail or
> any attachment may be disclosed, copied or distributed, and that any other
> action related to this e-mail or attachment is strictly prohibited, and may
> be unlawful. If you have received this e-mail by error, please notify the
> sender immediately by return e-mail, and delete this message.
>
> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
> employees shall not be liable for the incorrect or incomplete transmission
> of this e-mail or any attachments, nor responsible for any delay in receipt.
> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
> Airlines) is registered in Amstelveen, The Netherlands, with registered
> number 33014286
> 
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] has anyone successfully installed ceph with the crowbar

2013-04-18 Thread Makkelie, R - SPLXL
well i tried to build the barclamp from https://github.com/ceph/barclamp-ceph
and pacakge it with https://github.com/ceph/package-ceph-barclamp

but the install fails

so i also found a barclamp that is installing argonaut
and it installs ceph

but when i manually try to add a image in the volumes pool it fails.
this is due to some permission failures probably because i try to follow a 
manual that is created for bobtail 
http://ceph.com/docs/master/rbd/rbd-openstack/



On Thu, 2013-04-18 at 08:48 -0700, Gregory Farnum wrote:


The barclamps were written against the crowbar "Betty" release, 
OpenStack Essex (which is the last one supported by Crowbar), and Ceph 
"argonaut". JJ has updated them to use "Bobtail", but I don't think anybody's 
run them against newer versions of Openstack. :( 

You should be able to find built versions of these on the Inktank or 
Ceph websites, though I don't remember where exactly. 



What are you trying to do, precisely? 

-Greg

On Thursday, April 18, 2013, Makkelie, R - SPLXL wrote: 

Hi,

Has anyone successfully installed Ceph using the ceph-barclamp 
with crowbar.
if yes what version are you using and how did you created the 
barclamp
and did you integrated it with Openstack folsom/Grizzly?

GreetZ
Ramonskie 

For information, services and offers, please visit our web 
site: http://www.klm.com. This e-mail and any attachment may contain 
confidential and privileged material intended for the addressee only. If you 
are not the addressee, you are notified that no part of the e-mail or any 
attachment may be disclosed, copied or distributed, and that any other action 
related to this e-mail or attachment is strictly prohibited, and may be 
unlawful. If you have received this e-mail by error, please notify the sender 
immediately by return e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries 
and/or its employees shall not be liable for the incorrect or incomplete 
transmission of this e-mail or any attachments, nor responsible for any delay 
in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM 
Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with 
registered number 33014286 
 



-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com




For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] spontaneous pg inconstancies in the rgw.gc pool

2013-04-18 Thread Yehuda Sadeh
On Thu, Apr 18, 2013 at 7:57 AM, Dan van der Ster  wrote:
>
> Sorry for the noise.. we now have a better idea what happened here.
>
> For those that might care, basically we had one client looping while
> trying to list the / bucket with an incorrect key. rgw was handling
> this at 1kHz, so congratulations on that. I will now go and read how
> to either decrease the log level or increase the log rotate frequency.


debug rgw = 0

or set it to 1 or 2 if you still want to have something semi useful in
your logs.


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] has anyone successfully installed ceph with the crowbar

2013-04-18 Thread John Wilkins
Keep me posted on this, and I'll update the docs when we have a resolution.


On Thu, Apr 18, 2013 at 8:55 AM, Makkelie, R - SPLXL  wrote:

> **
> well i tried to build the barclamp from
> https://github.com/ceph/barclamp-ceph
> and pacakge it with https://github.com/ceph/package-ceph-barclamp
>
> but the install fails
>
> so i also found a barclamp that is installing argonaut
> and it installs ceph
>
> but when i manually try to add a image in the volumes pool it fails.
> this is due to some permission failures probably because i try to follow a
> manual that is created for bobtail
> http://ceph.com/docs/master/rbd/rbd-openstack/
>
>
>
>
> On Thu, 2013-04-18 at 08:48 -0700, Gregory Farnum wrote:
>
> The barclamps were written against the crowbar "Betty" release, OpenStack
> Essex (which is the last one supported by Crowbar), and Ceph "argonaut". JJ
> has updated them to use "Bobtail", but I don't think anybody's run them
> against newer versions of Openstack. :(
>
>  You should be able to find built versions of these on the Inktank or Ceph
> websites, though I don't remember where exactly.
>
>
>
>  What are you trying to do, precisely?
>
>  -Greg
>
> On Thursday, April 18, 2013, Makkelie, R - SPLXL wrote:
>
>  Hi,
>
> Has anyone successfully installed Ceph using the ceph-barclamp with
> crowbar.
> if yes what version are you using and how did you created the barclamp
> and did you integrated it with Openstack folsom/Grizzly?
>
> GreetZ
> Ramonskie 
> For information, services and offers, please visit our web site:
> http://www.klm.com. This e-mail and any attachment may contain
> confidential and privileged material intended for the addressee only. If
> you are not the addressee, you are notified that no part of the e-mail or
> any attachment may be disclosed, copied or distributed, and that any other
> action related to this e-mail or attachment is strictly prohibited, and may
> be unlawful. If you have received this e-mail by error, please notify the
> sender immediately by return e-mail, and delete this message.
>
> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
> employees shall not be liable for the incorrect or incomplete transmission
> of this e-mail or any attachments, nor responsible for any delay in receipt.
> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
> Airlines) is registered in Amstelveen, The Netherlands, with registered
> number 33014286
> 
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> 
> For information, services and offers, please visit our web site:
> http://www.klm.com. This e-mail and any attachment may contain
> confidential and privileged material intended for the addressee only. If
> you are not the addressee, you are notified that no part of the e-mail or
> any attachment may be disclosed, copied or distributed, and that any other
> action related to this e-mail or attachment is strictly prohibited, and may
> be unlawful. If you have received this e-mail by error, please notify the
> sender immediately by return e-mail, and delete this message.
>
> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
> employees shall not be liable for the incorrect or incomplete transmission
> of this e-mail or any attachments, nor responsible for any delay in receipt.
> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
> Airlines) is registered in Amstelveen, The Netherlands, with registered
> number 33014286
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] has anyone successfully installed ceph with the crowbar

2013-04-18 Thread Gregory Farnum
Oh, yeah. Bobtail isn't going to play nicely without some
modifications, but I'll have to wait for JJ to speak about those.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Apr 18, 2013 at 8:55 AM, Makkelie, R - SPLXL
 wrote:
> well i tried to build the barclamp from
> https://github.com/ceph/barclamp-ceph
> and pacakge it with https://github.com/ceph/package-ceph-barclamp
>
> but the install fails
>
> so i also found a barclamp that is installing argonaut
> and it installs ceph
>
> but when i manually try to add a image in the volumes pool it fails.
> this is due to some permission failures probably because i try to follow a
> manual that is created for bobtail
> http://ceph.com/docs/master/rbd/rbd-openstack/
>
>
>
>
> On Thu, 2013-04-18 at 08:48 -0700, Gregory Farnum wrote:
>
> The barclamps were written against the crowbar "Betty" release, OpenStack
> Essex (which is the last one supported by Crowbar), and Ceph "argonaut". JJ
> has updated them to use "Bobtail", but I don't think anybody's run them
> against newer versions of Openstack. :(
>
> You should be able to find built versions of these on the Inktank or Ceph
> websites, though I don't remember where exactly.
>
>
>
> What are you trying to do, precisely?
>
> -Greg
>
> On Thursday, April 18, 2013, Makkelie, R - SPLXL wrote:
>
> Hi,
>
> Has anyone successfully installed Ceph using the ceph-barclamp with crowbar.
> if yes what version are you using and how did you created the barclamp
> and did you integrated it with Openstack folsom/Grizzly?
>
> GreetZ
> Ramonskie 
> For information, services and offers, please visit our web site:
> http://www.klm.com. This e-mail and any attachment may contain confidential
> and privileged material intended for the addressee only. If you are not the
> addressee, you are notified that no part of the e-mail or any attachment may
> be disclosed, copied or distributed, and that any other action related to
> this e-mail or attachment is strictly prohibited, and may be unlawful. If
> you have received this e-mail by error, please notify the sender immediately
> by return e-mail, and delete this message.
>
> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
> employees shall not be liable for the incorrect or incomplete transmission
> of this e-mail or any attachments, nor responsible for any delay in receipt.
> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
> Airlines) is registered in Amstelveen, The Netherlands, with registered
> number 33014286
> 
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> 
> For information, services and offers, please visit our web site:
> http://www.klm.com. This e-mail and any attachment may contain confidential
> and privileged material intended for the addressee only. If you are not the
> addressee, you are notified that no part of the e-mail or any attachment may
> be disclosed, copied or distributed, and that any other action related to
> this e-mail or attachment is strictly prohibited, and may be unlawful. If
> you have received this e-mail by error, please notify the sender immediately
> by return e-mail, and delete this message.
>
> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
> employees shall not be liable for the incorrect or incomplete transmission
> of this e-mail or any attachments, nor responsible for any delay in receipt.
> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
> Airlines) is registered in Amstelveen, The Netherlands, with registered
> number 33014286
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No rolling updates from v0.56 to v0.60+?

2013-04-18 Thread Gregory Farnum
On Wed, Apr 17, 2013 at 7:40 AM, Guido Winkelmann
 wrote:
> Hi,
>
> I just tried upgrading parts of our experimental ceph cluster from 0.56.1 to
> 0.60, and it looks like the new mon-daemon from 0.60 cannot talk to those from
> 0.56.1 at all.
>
> Long story short, we had to move some hardware around and during that time I
> had to shrink the cluster to one single machine. My plan was to expand it to
> three machines again, so that I would again have 3 mons and 3 osds, as before.
> I just installed the first new machine, going straight for 0.60, but leaving
> the remaining old one at 0.56.1. I added the new mon to the mon map according
> to the documentation and started the new mon daemon, but the mon-cluster
> wouldn't achieve quorum. In the logs for the new mon, I saw the following line
> repeated a lot:
>
> 0 -- 10.6.224.129:6789/0 >> 10.6.224.131:6789/0 pipe(0x2da5ec0 sd=20 :37863
> s=1 pgs=0 cs=0 l=0).connect protocol version mismatch, my 10 != 9
>
> The old mon had no such lines in its log.
>
> I could only solve this by shutting down the old mon and upgrading it to 0.60
> as well.
>
> It looks to me like this means rolling upgrades without downtime won't be
> possible from bobtail to cuttlefish. Is that correct?

If the cluster is in good shape, this shouldn't actually result in
downtime. Do a rolling upgrade of your monitors, and then when a
majority of them are on Cuttlefish they'll switch over to form the
quorum — the "downtime" being the period a store requires to update,
which shouldn't be long, and it will only be the monitors that are
inaccessible (unless it takes a truly ridiculous time for the
upgrade). All the rest of the daemons you can do rolling upgrades on
just the same as before.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No rolling updates from v0.56 to v0.60+?

2013-04-18 Thread Joao Eduardo Luis

On 04/18/2013 05:28 PM, Gregory Farnum wrote:

On Wed, Apr 17, 2013 at 7:40 AM, Guido Winkelmann
 wrote:

Hi,

I just tried upgrading parts of our experimental ceph cluster from 0.56.1 to
0.60, and it looks like the new mon-daemon from 0.60 cannot talk to those from
0.56.1 at all.

Long story short, we had to move some hardware around and during that time I
had to shrink the cluster to one single machine. My plan was to expand it to
three machines again, so that I would again have 3 mons and 3 osds, as before.
I just installed the first new machine, going straight for 0.60, but leaving
the remaining old one at 0.56.1. I added the new mon to the mon map according
to the documentation and started the new mon daemon, but the mon-cluster
wouldn't achieve quorum. In the logs for the new mon, I saw the following line
repeated a lot:

0 -- 10.6.224.129:6789/0 >> 10.6.224.131:6789/0 pipe(0x2da5ec0 sd=20 :37863
s=1 pgs=0 cs=0 l=0).connect protocol version mismatch, my 10 != 9

The old mon had no such lines in its log.

I could only solve this by shutting down the old mon and upgrading it to 0.60
as well.

It looks to me like this means rolling upgrades without downtime won't be
possible from bobtail to cuttlefish. Is that correct?


If the cluster is in good shape, this shouldn't actually result in
downtime. Do a rolling upgrade of your monitors, and then when a
majority of them are on Cuttlefish they'll switch over to form the
quorum — the "downtime" being the period a store requires to update,
which shouldn't be long, and it will only be the monitors that are
inaccessible (unless it takes a truly ridiculous time for the
upgrade). All the rest of the daemons you can do rolling upgrades on
just the same as before.


Another potential source of delay would be the synchronization process 
triggered when a majority of monitors have been upgraded.


Say you have 5 monitors.

You upgrade two while the cluster is happily running: the stores are 
converted, which may take longer if the store is huge [1], but you get 
your monitors ready to join the quorum as soon as a third member is 
upgraded.


During this time, your cluster kept on going, with more versions being 
created.


And then you decide to upgrade the third monitor.  It will go through 
the same period of downtime as the other two monitors -- which as Greg 
said shouldn't be long, but may be if your stores are huge [1] -- and 
this will be the bulk of your downtime.


However, as the cluster kept on going, there's a chance that the first 
two monitors to be upgraded will have fallen out of sync with the more 
recent cluster state.  That will trigger a store sync, which shouldn't 
take long either, but this is somewhat bound by the store size and the 
amount of versions that were created in-between.  You might even be 
lucky enough, and go through with the whole thing in no time and the 
sync might not even be necessary (there's another mechanism to handle 
catch-up when the monitors haven't drifted that much).


Anyway, when you are finally upgrading the third monitor (out of 5), 
that is going to break quorum, so it would probably be wise to just 
upgrade the remaining monitors all at once.



[1] - With the new leveldb tuning this might not even be an issue.

  -Joao


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] has anyone successfully installed ceph with the crowbar

2013-04-18 Thread JuanJose Galvez
We're making sure that the modified barclamps are successfully going
through the Tempest tests, once they do I'll be sending a pull request
with all the changes for a bobtail enabled barclamp to the repo.

The main problem with using bobtail is actually with the Nova package,
it currently includes different versions of libvirt and other software
packages than what comes on the latest Crowbar iso. So rather than
pushing out a full Nova barclamp we may just be patching against the
version on the Crowbar ISO.

Until I've verified through Tempest tests this is all a WIP. If you'd
like to preview the Nova patch it can be found here:

https://github.com/jgalvez/WIP/blob/master/nova/nova.patch

That includes the updated permissions so that Nova will work with bobtail.

-JJ

On 4/18/2013 9:23 AM, Gregory Farnum wrote:
> Oh, yeah. Bobtail isn't going to play nicely without some
> modifications, but I'll have to wait for JJ to speak about those.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Thu, Apr 18, 2013 at 8:55 AM, Makkelie, R - SPLXL
>  wrote:
>> well i tried to build the barclamp from
>> https://github.com/ceph/barclamp-ceph
>> and pacakge it with https://github.com/ceph/package-ceph-barclamp
>>
>> but the install fails
>>
>> so i also found a barclamp that is installing argonaut
>> and it installs ceph
>>
>> but when i manually try to add a image in the volumes pool it fails.
>> this is due to some permission failures probably because i try to follow a
>> manual that is created for bobtail
>> http://ceph.com/docs/master/rbd/rbd-openstack/
>>
>>
>>
>>
>> On Thu, 2013-04-18 at 08:48 -0700, Gregory Farnum wrote:
>>
>> The barclamps were written against the crowbar "Betty" release, OpenStack
>> Essex (which is the last one supported by Crowbar), and Ceph "argonaut". JJ
>> has updated them to use "Bobtail", but I don't think anybody's run them
>> against newer versions of Openstack. :(
>>
>> You should be able to find built versions of these on the Inktank or Ceph
>> websites, though I don't remember where exactly.
>>
>>
>>
>> What are you trying to do, precisely?
>>
>> -Greg
>>
>> On Thursday, April 18, 2013, Makkelie, R - SPLXL wrote:
>>
>> Hi,
>>
>> Has anyone successfully installed Ceph using the ceph-barclamp with crowbar.
>> if yes what version are you using and how did you created the barclamp
>> and did you integrated it with Openstack folsom/Grizzly?
>>
>> GreetZ
>> Ramonskie 
>> For information, services and offers, please visit our web site:
>> http://www.klm.com. This e-mail and any attachment may contain confidential
>> and privileged material intended for the addressee only. If you are not the
>> addressee, you are notified that no part of the e-mail or any attachment may
>> be disclosed, copied or distributed, and that any other action related to
>> this e-mail or attachment is strictly prohibited, and may be unlawful. If
>> you have received this e-mail by error, please notify the sender immediately
>> by return e-mail, and delete this message.
>>
>> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
>> employees shall not be liable for the incorrect or incomplete transmission
>> of this e-mail or any attachments, nor responsible for any delay in receipt.
>> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
>> Airlines) is registered in Amstelveen, The Netherlands, with registered
>> number 33014286
>> 
>>
>>
>>
>> --
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> 
>> For information, services and offers, please visit our web site:
>> http://www.klm.com. This e-mail and any attachment may contain confidential
>> and privileged material intended for the addressee only. If you are not the
>> addressee, you are notified that no part of the e-mail or any attachment may
>> be disclosed, copied or distributed, and that any other action related to
>> this e-mail or attachment is strictly prohibited, and may be unlawful. If
>> you have received this e-mail by error, please notify the sender immediately
>> by return e-mail, and delete this message.
>>
>> Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
>> employees shall not be liable for the incorrect or incomplete transmission
>> of this e-mail or any attachments, nor responsible for any delay in receipt.
>> Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
>> Airlines) is registered in Amstelveen, The Netherlands, with registered
>> number 33014286
>> 


-- 
JuanJose "JJ" Galvez
Professional Services
Inktank Storage, Inc.
LinkedIn: http://www.linkedin.com/in/jjgalvez
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.c

Re: [ceph-users] No rolling updates from v0.56 to v0.60+?

2013-04-18 Thread Stefan Priebe - Profihost AG
Isn't the new leveldb tuning part of cuttlefish.

Stefan

Am 18.04.2013 um 19:40 schrieb Joao Eduardo Luis :

> On 04/18/2013 05:28 PM, Gregory Farnum wrote:
>> On Wed, Apr 17, 2013 at 7:40 AM, Guido Winkelmann
>>  wrote:
>>> Hi,
>>> 
>>> I just tried upgrading parts of our experimental ceph cluster from 0.56.1 to
>>> 0.60, and it looks like the new mon-daemon from 0.60 cannot talk to those 
>>> from
>>> 0.56.1 at all.
>>> 
>>> Long story short, we had to move some hardware around and during that time I
>>> had to shrink the cluster to one single machine. My plan was to expand it to
>>> three machines again, so that I would again have 3 mons and 3 osds, as 
>>> before.
>>> I just installed the first new machine, going straight for 0.60, but leaving
>>> the remaining old one at 0.56.1. I added the new mon to the mon map 
>>> according
>>> to the documentation and started the new mon daemon, but the mon-cluster
>>> wouldn't achieve quorum. In the logs for the new mon, I saw the following 
>>> line
>>> repeated a lot:
>>> 
>>> 0 -- 10.6.224.129:6789/0 >> 10.6.224.131:6789/0 pipe(0x2da5ec0 sd=20 :37863
>>> s=1 pgs=0 cs=0 l=0).connect protocol version mismatch, my 10 != 9
>>> 
>>> The old mon had no such lines in its log.
>>> 
>>> I could only solve this by shutting down the old mon and upgrading it to 
>>> 0.60
>>> as well.
>>> 
>>> It looks to me like this means rolling upgrades without downtime won't be
>>> possible from bobtail to cuttlefish. Is that correct?
>> 
>> If the cluster is in good shape, this shouldn't actually result in
>> downtime. Do a rolling upgrade of your monitors, and then when a
>> majority of them are on Cuttlefish they'll switch over to form the
>> quorum — the "downtime" being the period a store requires to update,
>> which shouldn't be long, and it will only be the monitors that are
>> inaccessible (unless it takes a truly ridiculous time for the
>> upgrade). All the rest of the daemons you can do rolling upgrades on
>> just the same as before.
> 
> Another potential source of delay would be the synchronization process 
> triggered when a majority of monitors have been upgraded.
> 
> Say you have 5 monitors.
> 
> You upgrade two while the cluster is happily running: the stores are 
> converted, which may take longer if the store is huge [1], but you get your 
> monitors ready to join the quorum as soon as a third member is upgraded.
> 
> During this time, your cluster kept on going, with more versions being 
> created.
> 
> And then you decide to upgrade the third monitor.  It will go through the 
> same period of downtime as the other two monitors -- which as Greg said 
> shouldn't be long, but may be if your stores are huge [1] -- and this will be 
> the bulk of your downtime.
> 
> However, as the cluster kept on going, there's a chance that the first two 
> monitors to be upgraded will have fallen out of sync with the more recent 
> cluster state.  That will trigger a store sync, which shouldn't take long 
> either, but this is somewhat bound by the store size and the amount of 
> versions that were created in-between.  You might even be lucky enough, and 
> go through with the whole thing in no time and the sync might not even be 
> necessary (there's another mechanism to handle catch-up when the monitors 
> haven't drifted that much).
> 
> Anyway, when you are finally upgrading the third monitor (out of 5), that is 
> going to break quorum, so it would probably be wise to just upgrade the 
> remaining monitors all at once.
> 
> 
> [1] - With the new leveldb tuning this might not even be an issue.
> 
>  -Joao
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph configure RAM for each daemon instance ?

2013-04-18 Thread Wido den Hollander

On 04/18/2013 04:23 PM, konradwro wrote:

Hello, it is possible to configure ceph.conf RAM for each daemon instance ?


No, the daemons will use as much as they need and is available. You can 
put the daemons in a cgroup to limit their memory usage, but that comes 
with the problem that they could go Out of Memory.


Wido




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Health problem .. how to fix ?

2013-04-18 Thread John Wilkins
Stephane,

The monitoring section of operations explains what's happening, but I think
I probably need to do a better job of explaining unfound objects.
http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/
http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#unfound-objects

Let me know if those docs help, and let me know how I can improve on those
for you. That's an area that's not covered quite as well as it needs to be.


On Thu, Apr 18, 2013 at 8:26 AM, Stephane Boisvert <
stephane.boisv...@gameloft.com> wrote:

>  Hi,
>
>   I configured a test 'cluster' and did play with it (moving osd folders
> around ie. journal file) and did break something. Now I think that this can
> occurs again  when we go prod. so I would like to know how I can fix it.. I
> don't care about loosing my files..
>
> Anyone can help? here's the logs
>
>
> HEALTH_WARN 15 pgs degraded; 1 pgs recovering; 15 pgs stale; 15 pgs stuck
> stale; 16 pgs stuck unclean; recovery 3/180 degraded (1.667%); 1/60 unfound
> (1.667%)
> pg 5.63 is stuck unclean for 157741.457285, current state
> stale+active+degraded, last acting [2]
> pg 4.66 is stuck unclean for 77312.285409, current state
> stale+active+degraded, last acting [2]
> pg 4.64 is stuck unclean for 157741.034570, current state
> stale+active+degraded, last acting [2]
> pg 5.65 is stuck unclean for 77312.285382, current state
> stale+active+degraded, last acting [2]
> pg 4.49 is stuck unclean for 77312.285021, current state
> stale+active+degraded, last acting [2]
> pg 5.48 is stuck unclean for 77312.285058, current state
> stale+active+degraded, last acting [2]
> pg 1.26 is stuck unclean for 77362.971821, current state
> active+recovering, last acting [5,2,1]
> pg 2.10 is stuck unclean for 157740.553908, current state
> stale+active+degraded, last acting [2]
> pg 4.e is stuck unclean for 157740.355222, current state
> stale+active+degraded, last acting [2]
> pg 5.d is stuck unclean for 157740.354260, current state
> stale+active+degraded, last acting [2]
> pg 5.0 is stuck unclean for 77312.264545, current state
> stale+active+degraded, last acting [2]
> pg 4.1 is stuck unclean for 77312.264416, current state
> stale+active+degraded, last acting [2]
> pg 3.2 is stuck unclean for 77312.263108, current state
> stale+active+degraded, last acting [2]
> pg 2.3 is stuck unclean for 77312.263026, current state
> stale+active+degraded, last acting [2]
> pg 4.71 is stuck unclean for 157740.352440, current state
> stale+active+degraded, last acting [2]
> pg 5.70 is stuck unclean for 157740.352547, current state
> stale+active+degraded, last acting [2]
> pg 5.63 is stuck stale for 77085.263183, current state
> stale+active+degraded, last acting [2]
> pg 4.66 is stuck stale for 77085.263186, current state
> stale+active+degraded, last acting [2]
> pg 4.64 is stuck stale for 77085.263187, current state
> stale+active+degraded, last acting [2]
> pg 5.65 is stuck stale for 77085.263191, current state
> stale+active+degraded, last acting [2]
> pg 4.49 is stuck stale for 77085.263186, current state
> stale+active+degraded, last acting [2]
> pg 5.48 is stuck stale for 77085.263191, current state
> stale+active+degraded, last acting [2]
> pg 2.10 is stuck stale for 77085.263258, current state
> stale+active+degraded, last acting [2]
> pg 4.e is stuck stale for 77085.263247, current state
> stale+active+degraded, last acting [2]
> pg 5.d is stuck stale for 77085.263245, current state
> stale+active+degraded, last acting [2]
> pg 5.0 is stuck stale for 77085.263241, current state
> stale+active+degraded, last acting [2]
> pg 4.1 is stuck stale for 77085.263245, current state
> stale+active+degraded, last acting [2]
> pg 3.2 is stuck stale for 77085.263242, current state
> stale+active+degraded, last acting [2]
> pg 2.3 is stuck stale for 77085.263247, current state
> stale+active+degraded, last acting [2]
> pg 4.71 is stuck stale for 77085.263239, current state
> stale+active+degraded, last acting [2]
> pg 5.70 is stuck stale for 77085.263245, current state
> stale+active+degraded, last acting [2]
> pg 1.26 is active+recovering, acting [5,2,1], 1 unfound
> recovery 3/180 degraded (1.667%); 1/60 unfound (1.667%)
>
>
>
> Thanks
>
>
> --
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bobtail & Precise

2013-04-18 Thread John Wilkins
Bryan,

It seems you got crickets with this question. Did you get any further? I'd
like to add it to my upcoming CRUSH troubleshooting section.


On Wed, Apr 3, 2013 at 9:27 AM, Bryan Stillwell
wrote:

> I have two test clusters running Bobtail (0.56.4) and Ubuntu Precise
> (12.04.2).  The problem I'm having is that I'm not able to get either
> of them into a state where I can both mount the filesystem and have
> all the PGs in the active+clean state.
>
> It seems that on both clusters I can get them into a 100% active+clean
> state by setting "ceph osd crush tunables bobtail", but when I try to
> mount the filesystem I get:
>
> mount error 5 = Input/output error
>
>
> However, if I set "ceph osd crush tunables legacy" I can mount both
> filesystems, but then some of the PGs are stuck in the
> "active+remapped" state:
>
> # ceph -s
>health HEALTH_WARN 29 pgs stuck unclean; recovery 5/1604152 degraded
> (0.000%)
>monmap e1: 1 mons at {a=172.16.0.50:6789/0}, election epoch 1, quorum
> 0 a
>osdmap e10272: 20 osds: 20 up, 20 in
> pgmap v1114740: 1920 pgs: 1890 active+clean, 29 active+remapped, 1
> active+clean+scrubbing; 3086 GB data, 6201 GB used, 3098 GB / 9300 GB
> avail; 232B/s wr, 0op/s; 5/1604152 degraded (0.000%)
>mdsmap e420: 1/1/1 up {0=a=up:active}
>
>
> Is any one else seeing this?
>
> Thanks,
> Bryan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bobtail & Precise

2013-04-18 Thread Gregory Farnum
Seeing this go by again it's simple enough to provide a quick
answer/hint — by setting the tunables it's of course getting a better
distribution of data, but the reason they're optional to begin with is
that older clients won't support them. In this case, the kernel client
being run; so it returns an error.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Apr 18, 2013 at 12:51 PM, John Wilkins  wrote:
> Bryan,
>
> It seems you got crickets with this question. Did you get any further? I'd
> like to add it to my upcoming CRUSH troubleshooting section.
>
>
> On Wed, Apr 3, 2013 at 9:27 AM, Bryan Stillwell 
> wrote:
>>
>> I have two test clusters running Bobtail (0.56.4) and Ubuntu Precise
>> (12.04.2).  The problem I'm having is that I'm not able to get either
>> of them into a state where I can both mount the filesystem and have
>> all the PGs in the active+clean state.
>>
>> It seems that on both clusters I can get them into a 100% active+clean
>> state by setting "ceph osd crush tunables bobtail", but when I try to
>> mount the filesystem I get:
>>
>> mount error 5 = Input/output error
>>
>>
>> However, if I set "ceph osd crush tunables legacy" I can mount both
>> filesystems, but then some of the PGs are stuck in the
>> "active+remapped" state:
>>
>> # ceph -s
>>health HEALTH_WARN 29 pgs stuck unclean; recovery 5/1604152 degraded
>> (0.000%)
>>monmap e1: 1 mons at {a=172.16.0.50:6789/0}, election epoch 1, quorum 0
>> a
>>osdmap e10272: 20 osds: 20 up, 20 in
>> pgmap v1114740: 1920 pgs: 1890 active+clean, 29 active+remapped, 1
>> active+clean+scrubbing; 3086 GB data, 6201 GB used, 3098 GB / 9300 GB
>> avail; 232B/s wr, 0op/s; 5/1604152 degraded (0.000%)
>>mdsmap e420: 1/1/1 up {0=a=up:active}
>>
>>
>> Is any one else seeing this?
>>
>> Thanks,
>> Bryan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> John Wilkins
> Senior Technical Writer
> Intank
> john.wilk...@inktank.com
> (415) 425-9599
> http://inktank.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bobtail & Precise

2013-04-18 Thread Bryan Stillwell
John,

Thanks for your response.  I haven't spent a lot of time on this issue
since then, so I'm still in the same situation.  I do remember seeing an
error message about an unsupported feature at one point after setting the
tunables to bobtail.

Bryan


On Thu, Apr 18, 2013 at 1:51 PM, John Wilkins wrote:

> Bryan,
>
> It seems you got crickets with this question. Did you get any further? I'd
> like to add it to my upcoming CRUSH troubleshooting section.
>
>
> On Wed, Apr 3, 2013 at 9:27 AM, Bryan Stillwell <
> bstillw...@photobucket.com> wrote:
>
>> I have two test clusters running Bobtail (0.56.4) and Ubuntu Precise
>> (12.04.2).  The problem I'm having is that I'm not able to get either
>> of them into a state where I can both mount the filesystem and have
>> all the PGs in the active+clean state.
>>
>> It seems that on both clusters I can get them into a 100% active+clean
>> state by setting "ceph osd crush tunables bobtail", but when I try to
>> mount the filesystem I get:
>>
>> mount error 5 = Input/output error
>>
>>
>> However, if I set "ceph osd crush tunables legacy" I can mount both
>> filesystems, but then some of the PGs are stuck in the
>> "active+remapped" state:
>>
>> # ceph -s
>>health HEALTH_WARN 29 pgs stuck unclean; recovery 5/1604152 degraded
>> (0.000%)
>>monmap e1: 1 mons at {a=172.16.0.50:6789/0}, election epoch 1, quorum
>> 0 a
>>osdmap e10272: 20 osds: 20 up, 20 in
>> pgmap v1114740: 1920 pgs: 1890 active+clean, 29 active+remapped, 1
>> active+clean+scrubbing; 3086 GB data, 6201 GB used, 3098 GB / 9300 GB
>> avail; 232B/s wr, 0op/s; 5/1604152 degraded (0.000%)
>>mdsmap e420: 1/1/1 up {0=a=up:active}
>>
>>
>> Is any one else seeing this?
>>
>> Thanks,
>> Bryan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> John Wilkins
> Senior Technical Writer
> Intank
> john.wilk...@inktank.com
> (415) 425-9599
> http://inktank.com
>



-- 
[image: Photobucket] 

*Bryan Stillwell*
SENIOR SYSTEM ADMINISTRATOR

E: bstillw...@photobucket.com
O: 303.228.5109
M: 970.310.6085

[image: Facebook] [image:
Twitter][image:
Photobucket] 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bobtail & Precise

2013-04-18 Thread Bryan Stillwell
What's the fix for people running precise (12.04)?  I believe I see the
same issue with quantal (12.10) as well.


On Thu, Apr 18, 2013 at 1:56 PM, Gregory Farnum  wrote:

> Seeing this go by again it's simple enough to provide a quick
> answer/hint — by setting the tunables it's of course getting a better
> distribution of data, but the reason they're optional to begin with is
> that older clients won't support them. In this case, the kernel client
> being run; so it returns an error.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Thu, Apr 18, 2013 at 12:51 PM, John Wilkins 
> wrote:
> > Bryan,
> >
> > It seems you got crickets with this question. Did you get any further?
> I'd
> > like to add it to my upcoming CRUSH troubleshooting section.
> >
> >
> > On Wed, Apr 3, 2013 at 9:27 AM, Bryan Stillwell <
> bstillw...@photobucket.com>
> > wrote:
> >>
> >> I have two test clusters running Bobtail (0.56.4) and Ubuntu Precise
> >> (12.04.2).  The problem I'm having is that I'm not able to get either
> >> of them into a state where I can both mount the filesystem and have
> >> all the PGs in the active+clean state.
> >>
> >> It seems that on both clusters I can get them into a 100% active+clean
> >> state by setting "ceph osd crush tunables bobtail", but when I try to
> >> mount the filesystem I get:
> >>
> >> mount error 5 = Input/output error
> >>
> >>
> >> However, if I set "ceph osd crush tunables legacy" I can mount both
> >> filesystems, but then some of the PGs are stuck in the
> >> "active+remapped" state:
> >>
> >> # ceph -s
> >>health HEALTH_WARN 29 pgs stuck unclean; recovery 5/1604152 degraded
> >> (0.000%)
> >>monmap e1: 1 mons at {a=172.16.0.50:6789/0}, election epoch 1,
> quorum 0
> >> a
> >>osdmap e10272: 20 osds: 20 up, 20 in
> >> pgmap v1114740: 1920 pgs: 1890 active+clean, 29 active+remapped, 1
> >> active+clean+scrubbing; 3086 GB data, 6201 GB used, 3098 GB / 9300 GB
> >> avail; 232B/s wr, 0op/s; 5/1604152 degraded (0.000%)
> >>mdsmap e420: 1/1/1 up {0=a=up:active}
> >>
> >>
> >> Is any one else seeing this?
> >>
> >> Thanks,
> >> Bryan
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > --
> > John Wilkins
> > Senior Technical Writer
> > Intank
> > john.wilk...@inktank.com
> > (415) 425-9599
> > http://inktank.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>



-- 
[image: Photobucket] 

*Bryan Stillwell*
SENIOR SYSTEM ADMINISTRATOR

E: bstillw...@photobucket.com
O: 303.228.5109
M: 970.310.6085

[image: Facebook] [image:
Twitter][image:
Photobucket] 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bobtail & Precise

2013-04-18 Thread Gregory Farnum
There's not really a fix — either update all your clients so they support
the tunables (I'm not sure how new a kernel you need), or else run without
the tunables. In setups where your branching factors aren't very close to
your replication counts they aren't normally needed, if you want to reshape
your cluster a little bit.
-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Apr 18, 2013 at 1:04 PM, Bryan Stillwell  wrote:

> What's the fix for people running precise (12.04)?  I believe I see the
> same issue with quantal (12.10) as well.
>
>
> On Thu, Apr 18, 2013 at 1:56 PM, Gregory Farnum  wrote:
>
>> Seeing this go by again it's simple enough to provide a quick
>> answer/hint — by setting the tunables it's of course getting a better
>> distribution of data, but the reason they're optional to begin with is
>> that older clients won't support them. In this case, the kernel client
>> being run; so it returns an error.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Thu, Apr 18, 2013 at 12:51 PM, John Wilkins 
>> wrote:
>> > Bryan,
>> >
>> > It seems you got crickets with this question. Did you get any further?
>> I'd
>> > like to add it to my upcoming CRUSH troubleshooting section.
>> >
>> >
>> > On Wed, Apr 3, 2013 at 9:27 AM, Bryan Stillwell <
>> bstillw...@photobucket.com>
>> > wrote:
>> >>
>> >> I have two test clusters running Bobtail (0.56.4) and Ubuntu Precise
>> >> (12.04.2).  The problem I'm having is that I'm not able to get either
>> >> of them into a state where I can both mount the filesystem and have
>> >> all the PGs in the active+clean state.
>> >>
>> >> It seems that on both clusters I can get them into a 100% active+clean
>> >> state by setting "ceph osd crush tunables bobtail", but when I try to
>> >> mount the filesystem I get:
>> >>
>> >> mount error 5 = Input/output error
>> >>
>> >>
>> >> However, if I set "ceph osd crush tunables legacy" I can mount both
>> >> filesystems, but then some of the PGs are stuck in the
>> >> "active+remapped" state:
>> >>
>> >> # ceph -s
>> >>health HEALTH_WARN 29 pgs stuck unclean; recovery 5/1604152 degraded
>> >> (0.000%)
>> >>monmap e1: 1 mons at {a=172.16.0.50:6789/0}, election epoch 1,
>> quorum 0
>> >> a
>> >>osdmap e10272: 20 osds: 20 up, 20 in
>> >> pgmap v1114740: 1920 pgs: 1890 active+clean, 29 active+remapped, 1
>> >> active+clean+scrubbing; 3086 GB data, 6201 GB used, 3098 GB / 9300 GB
>> >> avail; 232B/s wr, 0op/s; 5/1604152 degraded (0.000%)
>> >>mdsmap e420: 1/1/1 up {0=a=up:active}
>> >>
>> >>
>> >> Is any one else seeing this?
>> >>
>> >> Thanks,
>> >> Bryan
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> >
>> > --
>> > John Wilkins
>> > Senior Technical Writer
>> > Intank
>> > john.wilk...@inktank.com
>> > (415) 425-9599
>> > http://inktank.com
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
>
> --
> [image: Photobucket] 
>
> *Bryan Stillwell*
> SENIOR SYSTEM ADMINISTRATOR
>
> E: bstillw...@photobucket.com
> O: 303.228.5109
> M: 970.310.6085
>
> [image: Facebook]  [image: 
> Twitter][image:
> Photobucket] 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Backups

2013-04-18 Thread Craig Lewis
I'm new to Ceph, and considering using it to store a bunch of static 
files in the RADOS Gateway.  My files are all versioned, so we never 
modify files.  We only add new files, and delete unused files.



I'm trying to figure out how to back everything up, to protect against 
administrative and application errors.



I'm thinking about building one Ceph cluster that spans my primary and 
backup datacenters, with CRUSH rules that would store 2 replicas in each 
datacenter. I want to use BtrFS snapshots, like 
http://blog.rot13.org/2010/02/using_btrfs_snapshots_for_incremental_backup.html, 
but automated and with cleanup.  I'm doing something similar now, on my 
NFS servers with ZFS and a tool called zfs-snapshot-mgmt.


I read that only XFS is recommended for production clusters, since BtrFS 
itself is still beta.  Any idea how long until BtrFS is usable in 
production?


I'd prefer to run Ceph on ZFS, but I see there are some outstanding 
issues in tracker.  Is anybody doing Ceph on ZFS in production?  ZFS 
itself seems to be father along than BtrFS. Are there plans to make ZFS 
a first class supported filesystem for Ceph?




Assuming that BtrFS and ZFS are not recommended for production, I'm 
thinking about XFS in the primary datacenter, and BtrFS + snapshots in 
the backup datacenter.  Once BtrFS or ZFS is production ready, I'd 
slowly migrate all partitions off XFS.



Once the backups are made, using them is a bit tricky.

In the event of an operator or code error, I would mount the correct 
BtrFS snapshot on all nodes in the backup datacenter, someplace like 
/var/lib/ceph.restore/.  Then I'd make a copy of ceph.conf, and start 
building a temporary cluster that runs on a non-standard port, made up 
of only the backup datacenter machines.  The normal cluster would stay 
up and running.  Once the temporary cluster is up, I'd manually restore 
the RADOS Gateway objects that needed to be restored.



If there was ever a full cluster problem, like I did something stupid 
like rados rmpool metadata.  I'd shut down the whole cluster, and revert 
all of the BtrFS partitions to the last known good snapshot, and 
re-format all of the XFS partitions.  Start the cluster up again, and 
let Ceph replicate everything back to freshly formatted partitions.  I'd 
lose recent data, but it's better than losing all of the data.


Obviously, both of these scenarios would need a lot of testing and many 
practice runs before they're viable.  Has anybody tried this before?  If 
not, do you see any problems with the theory?




Thanks for the help.



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RDMA

2013-04-18 Thread Gandalf Corvotempesta
Hi,
will RDMA be supported in the shortterm?
I'm planning an infrastructure and I don't know if starting with IB
QDR or 10GbE.

IB is much cheaper than 10GbE and with RDMA should be 4x faster, but
with IPoIB as workaround I've read that is very very heavy on CPU and
very slow (15gbit more or less)

What do you suggest? 10GbE or IB QDR for the future?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Mark Nelson

On 04/18/2013 03:40 PM, Gandalf Corvotempesta wrote:

Hi,
will RDMA be supported in the shortterm?
I'm planning an infrastructure and I don't know if starting with IB
QDR or 10GbE.


Depends on your definition of RDMA, supported, and short term. ;)

We like the idea of using rsockets as it would be somewhat easier to 
implement and it looks like it may get pretty close to RDMA levels of 
throughput and latency.  Having said that, it's young and there's not 
currently a kernel implementation.  We'll probably spend some time 
evaluating rsockets before we'll dive into an RDMA based messenger entirely.




IB is much cheaper than 10GbE and with RDMA should be 4x faster, but
with IPoIB as workaround I've read that is very very heavy on CPU and
very slow (15gbit more or less)


I've done testing with IPoIB on QDR IB and was able to get around 2GB/s. 
 You may be able to get more throughput with interrupt affinity tuning, 
but I can at least say 2GB/s is possible.  I didn't look closely at CPU 
overhead.  Having said that, 10GbE can use some CPU too...




What do you suggest? 10GbE or IB QDR for the future?


10GbE is fully supported and widely used with Ceph while IB is a bit 
more complicated with fewer users.  Having said that, IPoIB seems to 
work just fine, and there is potential in the future for even better 
performance.  Which one is right for you probably depends on the 
existing network infrastructure you are using, how fast your OSD nodes 
are, and what you are trying to do.  Sadly there is no easy answer. :)


Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bobtail & Precise

2013-04-18 Thread Bryan Stillwell
Ahh, I think I have a better understanding now.  I had my crush map set up
like this:

default
basement
rack1
server1
osd.0
osd.1
osd.2
osd.3
osd.4
server2
osd.5
osd.6
osd.7
osd.8
osd.9
rack2
server3
osd.10
osd.11
osd.12
osd.13
osd.14
server4
osd.15
osd.16
osd.17
osd.18
osd.19

Since those failure domains are pretty small for the 2X replicas I
currently have set, I went ahead and changed it to be like this:

default
server1
osd.0
osd.1
osd.2
osd.3
osd.4
server2
osd.5
osd.6
osd.7
osd.8
osd.9
server3
osd.10
osd.11
osd.12
osd.13
osd.14
server4
osd.15
osd.16
osd.17
osd.18
osd.19

It's currently rebalancing with the new crushmap, so we shall see if that
clears things up in a few hours.

Bryan


On Thu, Apr 18, 2013 at 2:11 PM, Gregory Farnum  wrote:

> There's not really a fix — either update all your clients so they support
> the tunables (I'm not sure how new a kernel you need), or else run without
> the tunables. In setups where your branching factors aren't very close to
> your replication counts they aren't normally needed, if you want to reshape
> your cluster a little bit.
> -Greg
>
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Thu, Apr 18, 2013 at 1:04 PM, Bryan Stillwell <
> bstillw...@photobucket.com> wrote:
>
>> What's the fix for people running precise (12.04)?  I believe I see the
>> same issue with quantal (12.10) as well.
>>
>>
>> On Thu, Apr 18, 2013 at 1:56 PM, Gregory Farnum  wrote:
>>
>>> Seeing this go by again it's simple enough to provide a quick
>>> answer/hint — by setting the tunables it's of course getting a better
>>> distribution of data, but the reason they're optional to begin with is
>>> that older clients won't support them. In this case, the kernel client
>>> being run; so it returns an error.
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Thu, Apr 18, 2013 at 12:51 PM, John Wilkins 
>>> wrote:
>>> > Bryan,
>>> >
>>> > It seems you got crickets with this question. Did you get any further?
>>> I'd
>>> > like to add it to my upcoming CRUSH troubleshooting section.
>>> >
>>> >
>>> > On Wed, Apr 3, 2013 at 9:27 AM, Bryan Stillwell <
>>> bstillw...@photobucket.com>
>>> > wrote:
>>> >>
>>> >> I have two test clusters running Bobtail (0.56.4) and Ubuntu Precise
>>> >> (12.04.2).  The problem I'm having is that I'm not able to get either
>>> >> of them into a state where I can both mount the filesystem and have
>>> >> all the PGs in the active+clean state.
>>> >>
>>> >> It seems that on both clusters I can get them into a 100% active+clean
>>> >> state by setting "ceph osd crush tunables bobtail", but when I try to
>>> >> mount the filesystem I get:
>>> >>
>>> >> mount error 5 = Input/output error
>>> >>
>>> >>
>>> >> However, if I set "ceph osd crush tunables legacy" I can mount both
>>> >> filesystems, but then some of the PGs are stuck in the
>>> >> "active+remapped" state:
>>> >>
>>> >> # ceph -s
>>> >>health HEALTH_WARN 29 pgs stuck unclean; recovery 5/1604152
>>> degraded
>>> >> (0.000%)
>>> >>monmap e1: 1 mons at {a=172.16.0.50:6789/0}, election epoch 1,
>>> quorum 0
>>> >> a
>>> >>osdmap e10272: 20 osds: 20 up, 20 in
>>> >> pgmap v1114740: 1920 pgs: 1890 active+clean, 29 active+remapped, 1
>>> >> active+clean+scrubbing; 3086 GB data, 6201 GB used, 3098 GB / 9300 GB
>>> >> avail; 232B/s wr, 0op/s; 5/1604152 degraded (0.000%)
>>> >>mdsmap e420: 1/1/1 up {0=a=up:active}
>>> >>
>>> >>
>>> >> Is any one else seeing this?
>>> >>
>>> >> Thanks,
>>> >> Bryan
>>> >> ___
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > John Wilkins
>>> > Senior Technical Writer
>>> > Intank
>>> > john.wilk...@inktank.com
>>> > (415) 425-9599
>>> > http://inktank.com
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>>
>>
>>
>>
>> --
>> [image: Photobucket] 
>>
>> *Bryan Stillwell*
>> SENIOR SYSTEM ADMINISTRATOR
>>
>> E: bstillw...@photobucket.com
>> O: 303.228.5109
>> M: 970.310.6085
>>
>> [image: Facebook]  [image: 
>> Twitter][image:
>> Photobucket] 

Re: [ceph-users] RDMA

2013-04-18 Thread Gandalf Corvotempesta
2013/4/18 Mark Nelson :
> 10GbE is fully supported and widely used with Ceph while IB is a bit more
> complicated with fewer users.  Having said that, IPoIB seems to work just
> fine, and there is potential in the future for even better performance.
> Which one is right for you probably depends on the existing network
> infrastructure you are using, how fast your OSD nodes are, and what you are
> trying to do.  Sadly there is no easy answer. :)

QDR switches are sold (refurbished) at more or less € 2k, 40Gb/s and
usually 36ports.
10GbE costs at least 2x times more and with only 12 or 24 ports.

2GB/s on QDR cards is good and is still faster than 10GbE but still
half than what I would expect from a QDR card. Do you know why we
loose more than 50% of bandwidth?

Do you have experience on DDR cards?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Mark Nelson

On 04/18/2013 04:15 PM, Gandalf Corvotempesta wrote:

2013/4/18 Mark Nelson :

10GbE is fully supported and widely used with Ceph while IB is a bit more
complicated with fewer users.  Having said that, IPoIB seems to work just
fine, and there is potential in the future for even better performance.
Which one is right for you probably depends on the existing network
infrastructure you are using, how fast your OSD nodes are, and what you are
trying to do.  Sadly there is no easy answer. :)


QDR switches are sold (refurbished) at more or less € 2k, 40Gb/s and
usually 36ports.
10GbE costs at least 2x times more and with only 12 or 24 ports.

2GB/s on QDR cards is good and is still faster than 10GbE but still
half than what I would expect from a QDR card. Do you know why we
loose more than 50% of bandwidth?


Well, even with RDMA you probably aren't going to get much more than 
~3.2GB/s (or at least that's what I saw on our production clusters at my 
last job).  There's encoding overhead so you can't get the full 40Gb/s.


Beyond that, it's just another software layer with the associated 
inefficiencies.  Frankly I'm kind of amazed that rsockets can supposedly 
get around 3GB/s.  That's impressive performance imho.




Do you have experience on DDR cards?



Yes, but not in conjunction with modern IPoIB.  I'm not sure how they 
would perform these days.  I imagine probably better than 10GbE, but I 
don't know by how much.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Gregory Farnum
Hey guys,
I finally had enough time to coordinate with a few other people and
figure out what's going on with the ceph-create-keys access denied
messages and create a ticket: http://tracker.ceph.com/issues/4752.
(I believe your monitor crash is something else, Matthew; if that
hasn't been dealt with yet. Unfortunately all that log has is
messages, so it probably needs a bit more. Can you check it out, Joao?
It appears to be a follower which ends up in propose_pending, which is
distinctly odd...)
Thanks for the bug report!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Apr 8, 2013 at 7:39 AM, Mike Dawson  wrote:
> Matthew,
>
> I have seen the same behavior on 0.59. Ran through some troubleshooting with
> Dan and Joao on March 21st and 22nd, but I haven't looked at it since then.
>
> If you look at running processes, I believe you'll see an instance of
> ceph-create-keys start each time you start a Monitor. So, if you restart the
> monitor several times, you'll have several ceph-create-keys processes
> piling, essentially leaking processes. IIRC, the tmp files you see in
> /etc/ceph correspond with the ceph-create-keys PID. Can you confirm that's
> what you are seeing?
>
> I haven't looked in a couple weeks, but I hope to start 0.60 later today.
>
> - Mike
>
>
>
>
>
>
> On 4/8/2013 12:43 AM, Matthew Roy wrote:
>>
>> I'm seeing weird messages in my monitor logs that don't correlate to
>> admin activity:
>>
>> 2013-04-07 22:54:11.528871 7f2e9e6c8700  1 --
>> [2001:::20]:6789/0 --> [2001:::20]:0/1920 --
>> mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
>> *,mds,allow]=-13 access denied v134192) v1 -- ?+0 0x37bfc00 con 0x3716840
>>
>> It's also writing out a bunch of empty files along the lines of
>> "ceph.client.admin.keyring.1008.tmp" in /etc/ceph/ Could this be related
>> to the mon trying to "Starting ceph-create-keys" when starting?
>>
>> This could be the cause of, or just associated with, some general
>> instability of the monitor cluster. After increasing the logging level I
>> did catch one crash:
>>
>>   ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
>>   1: /usr/bin/ceph-mon() [0x5834fa]
>>   2: (()+0xfcb0) [0x7f4b03328cb0]
>>   3: (gsignal()+0x35) [0x7f4b01efe425]
>>   4: (abort()+0x17b) [0x7f4b01f01b8b]
>>   5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b0285069d]
>>   6: (()+0xb5846) [0x7f4b0284e846]
>>   7: (()+0xb5873) [0x7f4b0284e873]
>>   8: (()+0xb596e) [0x7f4b0284e96e]
>>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x1df) [0x636c8f]
>>   10: (PaxosService::propose_pending()+0x46d) [0x4dee3d]
>>   11: (MDSMonitor::tick()+0x1c62) [0x51cdd2]
>>   12: (MDSMonitor::on_active()+0x1a) [0x512ada]
>>   13: (PaxosService::_active()+0x31d) [0x4e067d]
>>   14: (Context::complete(int)+0xa) [0x4b7b4a]
>>   15: (finish_contexts(CephContext*, std::list> std::allocator >&, int)+0x95) [0x4ba5a5]
>>   16: (Paxos::handle_last(MMonPaxos*)+0xbef) [0x4da92f]
>>   17: (Paxos::dispatch(PaxosServiceMessage*)+0x26b) [0x4dad8b]
>>   18: (Monitor::_ms_dispatch(Message*)+0x149f) [0x4b310f]
>>   19: (Monitor::ms_dispatch(Message*)+0x32) [0x4c9d12]
>>   20: (DispatchQueue::entry()+0x341) [0x698da1]
>>   21: (DispatchQueue::DispatchThread::entry()+0xd) [0x626c5d]
>>   22: (()+0x7e9a) [0x7f4b03320e9a]
>>   23: (clone()+0x6d) [0x7f4b01fbbcbd]
>>
>> The complete log is at: http://goo.gl/UmNs3
>>
>>
>> Does anyone recognize what's going on?
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Joao Eduardo Luis

On 04/18/2013 10:36 PM, Gregory Farnum wrote:

(I believe your monitor crash is something else, Matthew; if that
hasn't been dealt with yet. Unfortunately all that log has is
messages, so it probably needs a bit more. Can you check it out, Joao?


The stack trace below is #3495, and Matthew is already testing the fix 
(as per the tracker, so far so good, but we we should know more in the 
next day or so).



It appears to be a follower which ends up in propose_pending, which is
distinctly odd...)


I might be missing something, but what gave you that impression?  That 
would certainly be odd (to say the least!)


  -Joao


Thanks for the bug report!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Apr 8, 2013 at 7:39 AM, Mike Dawson  wrote:

Matthew,

I have seen the same behavior on 0.59. Ran through some troubleshooting with
Dan and Joao on March 21st and 22nd, but I haven't looked at it since then.

If you look at running processes, I believe you'll see an instance of
ceph-create-keys start each time you start a Monitor. So, if you restart the
monitor several times, you'll have several ceph-create-keys processes
piling, essentially leaking processes. IIRC, the tmp files you see in
/etc/ceph correspond with the ceph-create-keys PID. Can you confirm that's
what you are seeing?

I haven't looked in a couple weeks, but I hope to start 0.60 later today.

- Mike






On 4/8/2013 12:43 AM, Matthew Roy wrote:


I'm seeing weird messages in my monitor logs that don't correlate to
admin activity:

2013-04-07 22:54:11.528871 7f2e9e6c8700  1 --
[2001:::20]:6789/0 --> [2001:::20]:0/1920 --
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v134192) v1 -- ?+0 0x37bfc00 con 0x3716840

It's also writing out a bunch of empty files along the lines of
"ceph.client.admin.keyring.1008.tmp" in /etc/ceph/ Could this be related
to the mon trying to "Starting ceph-create-keys" when starting?

This could be the cause of, or just associated with, some general
instability of the monitor cluster. After increasing the logging level I
did catch one crash:

   ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
   1: /usr/bin/ceph-mon() [0x5834fa]
   2: (()+0xfcb0) [0x7f4b03328cb0]
   3: (gsignal()+0x35) [0x7f4b01efe425]
   4: (abort()+0x17b) [0x7f4b01f01b8b]
   5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b0285069d]
   6: (()+0xb5846) [0x7f4b0284e846]
   7: (()+0xb5873) [0x7f4b0284e873]
   8: (()+0xb596e) [0x7f4b0284e96e]
   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x636c8f]
   10: (PaxosService::propose_pending()+0x46d) [0x4dee3d]
   11: (MDSMonitor::tick()+0x1c62) [0x51cdd2]
   12: (MDSMonitor::on_active()+0x1a) [0x512ada]
   13: (PaxosService::_active()+0x31d) [0x4e067d]
   14: (Context::complete(int)+0xa) [0x4b7b4a]
   15: (finish_contexts(CephContext*, std::list >&, int)+0x95) [0x4ba5a5]
   16: (Paxos::handle_last(MMonPaxos*)+0xbef) [0x4da92f]
   17: (Paxos::dispatch(PaxosServiceMessage*)+0x26b) [0x4dad8b]
   18: (Monitor::_ms_dispatch(Message*)+0x149f) [0x4b310f]
   19: (Monitor::ms_dispatch(Message*)+0x32) [0x4c9d12]
   20: (DispatchQueue::entry()+0x341) [0x698da1]
   21: (DispatchQueue::DispatchThread::entry()+0xd) [0x626c5d]
   22: (()+0x7e9a) [0x7f4b03320e9a]
   23: (clone()+0x6d) [0x7f4b01fbbcbd]

The complete log is at: http://goo.gl/UmNs3


Does anyone recognize what's going on?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Gandalf Corvotempesta
2013/4/18 Mark Nelson :
> SDP is deprecated:
>
> http://comments.gmane.org/gmane.network.openfabrics.enterprise/5371
>
> rsockets is the future I think.

I don't know rsockets. Any plans about support for this or are they
"transparent" like SDP?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Gandalf Corvotempesta
2013/4/18 Sage Weil :
> I'm no expert, but I've heard SDP is not likely to be supported/maintained
> by anyone in the long-term.  (Please, anyone, correct me if that is not
> true!)  That said, one user has tested it successfully (with kernel and
> userland ceph) and it does seem to work..

Do you know what performance was obtained?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Mark Nelson

On 04/18/2013 04:46 PM, Gandalf Corvotempesta wrote:

2013/4/18 Mark Nelson :

SDP is deprecated:

http://comments.gmane.org/gmane.network.openfabrics.enterprise/5371

rsockets is the future I think.


I don't know rsockets. Any plans about support for this or are they
"transparent" like SDP?



I think we'll probably be investigating to see if rsockets could be a 
good intermediate solution before tackling RDMA.  The big question still 
is what to do since there isn't a kernel implementation (yet).


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Gregory Farnum
On Thu, Apr 18, 2013 at 2:46 PM, Joao Eduardo Luis
 wrote:
> On 04/18/2013 10:36 PM, Gregory Farnum wrote:
>>
>> (I believe your monitor crash is something else, Matthew; if that
>> hasn't been dealt with yet. Unfortunately all that log has is
>> messages, so it probably needs a bit more. Can you check it out, Joao?
>
>
> The stack trace below is #3495, and Matthew is already testing the fix (as
> per the tracker, so far so good, but we we should know more in the next day
> or so).
>
>
>> It appears to be a follower which ends up in propose_pending, which is
>> distinctly odd...)
>
>
> I might be missing something, but what gave you that impression?  That would
> certainly be odd (to say the least!)

I could have just missed some message traffic (or misread what's
there), but there is a pont where I think it's forwarding a command to
the leader, and the crash is in propose_pending. I like your answers
better. ;)
-Greg

>
>   -Joao
>
>
>> Thanks for the bug report!
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Mon, Apr 8, 2013 at 7:39 AM, Mike Dawson  wrote:
>>>
>>> Matthew,
>>>
>>> I have seen the same behavior on 0.59. Ran through some troubleshooting
>>> with
>>> Dan and Joao on March 21st and 22nd, but I haven't looked at it since
>>> then.
>>>
>>> If you look at running processes, I believe you'll see an instance of
>>> ceph-create-keys start each time you start a Monitor. So, if you restart
>>> the
>>> monitor several times, you'll have several ceph-create-keys processes
>>> piling, essentially leaking processes. IIRC, the tmp files you see in
>>> /etc/ceph correspond with the ceph-create-keys PID. Can you confirm
>>> that's
>>> what you are seeing?
>>>
>>> I haven't looked in a couple weeks, but I hope to start 0.60 later today.
>>>
>>> - Mike
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 4/8/2013 12:43 AM, Matthew Roy wrote:


 I'm seeing weird messages in my monitor logs that don't correlate to
 admin activity:

 2013-04-07 22:54:11.528871 7f2e9e6c8700  1 --
 [2001:::20]:6789/0 --> [2001:::20]:0/1920 --
 mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
 *,mds,allow]=-13 access denied v134192) v1 -- ?+0 0x37bfc00 con
 0x3716840

 It's also writing out a bunch of empty files along the lines of
 "ceph.client.admin.keyring.1008.tmp" in /etc/ceph/ Could this be related
 to the mon trying to "Starting ceph-create-keys" when starting?

 This could be the cause of, or just associated with, some general
 instability of the monitor cluster. After increasing the logging level I
 did catch one crash:

ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
1: /usr/bin/ceph-mon() [0x5834fa]
2: (()+0xfcb0) [0x7f4b03328cb0]
3: (gsignal()+0x35) [0x7f4b01efe425]
4: (abort()+0x17b) [0x7f4b01f01b8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b0285069d]
6: (()+0xb5846) [0x7f4b0284e846]
7: (()+0xb5873) [0x7f4b0284e873]
8: (()+0xb596e) [0x7f4b0284e96e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x1df) [0x636c8f]
10: (PaxosService::propose_pending()+0x46d) [0x4dee3d]
11: (MDSMonitor::tick()+0x1c62) [0x51cdd2]
12: (MDSMonitor::on_active()+0x1a) [0x512ada]
13: (PaxosService::_active()+0x31d) [0x4e067d]
14: (Context::complete(int)+0xa) [0x4b7b4a]
15: (finish_contexts(CephContext*, std::list>>> std::allocator >&, int)+0x95) [0x4ba5a5]
16: (Paxos::handle_last(MMonPaxos*)+0xbef) [0x4da92f]
17: (Paxos::dispatch(PaxosServiceMessage*)+0x26b) [0x4dad8b]
18: (Monitor::_ms_dispatch(Message*)+0x149f) [0x4b310f]
19: (Monitor::ms_dispatch(Message*)+0x32) [0x4c9d12]
20: (DispatchQueue::entry()+0x341) [0x698da1]
21: (DispatchQueue::DispatchThread::entry()+0xd) [0x626c5d]
22: (()+0x7e9a) [0x7f4b03320e9a]
23: (clone()+0x6d) [0x7f4b01fbbcbd]

 The complete log is at: http://goo.gl/UmNs3


 Does anyone recognize what's going on?

>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Joao Eduardo Luis
> Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Sage Weil
On Thu, 18 Apr 2013, Gandalf Corvotempesta wrote:
> 2013/4/18 Sage Weil :
> > I'm no expert, but I've heard SDP is not likely to be supported/maintained
> > by anyone in the long-term.  (Please, anyone, correct me if that is not
> > true!)  That said, one user has tested it successfully (with kernel and
> > userland ceph) and it does seem to work..
> 
> Do you know what performance was obtained?

Very good, AFAICS.  ~5.4 GB/sec on 56GiB SDP.  IPoIB on the same hardware 
was ~3.5 GB/sec.  But... deprecated.  Oh well!

But as I understand it, others have shown rsockets performance very 
similar to SDP.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Joao Eduardo Luis

On 04/18/2013 10:49 PM, Gregory Farnum wrote:

On Thu, Apr 18, 2013 at 2:46 PM, Joao Eduardo Luis
 wrote:

On 04/18/2013 10:36 PM, Gregory Farnum wrote:


(I believe your monitor crash is something else, Matthew; if that
hasn't been dealt with yet. Unfortunately all that log has is
messages, so it probably needs a bit more. Can you check it out, Joao?



The stack trace below is #3495, and Matthew is already testing the fix (as
per the tracker, so far so good, but we we should know more in the next day
or so).



It appears to be a follower which ends up in propose_pending, which is
distinctly odd...)



I might be missing something, but what gave you that impression?  That would
certainly be odd (to say the least!)


I could have just missed some message traffic (or misread what's
there), but there is a pont where I think it's forwarding a command to
the leader, and the crash is in propose_pending. I like your answers
better. ;)
-Greg



There's definitely some command messages being forwarded, but AFAICT 
they're being forwarded to the monitor, not by the monitor, which by 
itself is a good omen towards the monitor being the leader :-)


In any case, nothing in the trace's code path indicates we could be a 
peon, unless the monitor itself believed to be the leader.  If you take 
a closer look, you'll see that we come from 'handle_last()', which is 
bound to happen only on the leader (we'll assert otherwise).  For the 
monitor to be receiving these messages it must mean the peons believe 
him to be the leader -- or we have so many bugs going around that it's 
just madness!


In all seriousness, when I was chasing after this bug, Matthew sent me 
his logs with higher debug levels -- no craziness going around :-)


  -Joao





   -Joao



Thanks for the bug report!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Apr 8, 2013 at 7:39 AM, Mike Dawson  wrote:


Matthew,

I have seen the same behavior on 0.59. Ran through some troubleshooting
with
Dan and Joao on March 21st and 22nd, but I haven't looked at it since
then.

If you look at running processes, I believe you'll see an instance of
ceph-create-keys start each time you start a Monitor. So, if you restart
the
monitor several times, you'll have several ceph-create-keys processes
piling, essentially leaking processes. IIRC, the tmp files you see in
/etc/ceph correspond with the ceph-create-keys PID. Can you confirm
that's
what you are seeing?

I haven't looked in a couple weeks, but I hope to start 0.60 later today.

- Mike






On 4/8/2013 12:43 AM, Matthew Roy wrote:



I'm seeing weird messages in my monitor logs that don't correlate to
admin activity:

2013-04-07 22:54:11.528871 7f2e9e6c8700  1 --
[2001:::20]:6789/0 --> [2001:::20]:0/1920 --
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v134192) v1 -- ?+0 0x37bfc00 con
0x3716840

It's also writing out a bunch of empty files along the lines of
"ceph.client.admin.keyring.1008.tmp" in /etc/ceph/ Could this be related
to the mon trying to "Starting ceph-create-keys" when starting?

This could be the cause of, or just associated with, some general
instability of the monitor cluster. After increasing the logging level I
did catch one crash:

ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
1: /usr/bin/ceph-mon() [0x5834fa]
2: (()+0xfcb0) [0x7f4b03328cb0]
3: (gsignal()+0x35) [0x7f4b01efe425]
4: (abort()+0x17b) [0x7f4b01f01b8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b0285069d]
6: (()+0xb5846) [0x7f4b0284e846]
7: (()+0xb5873) [0x7f4b0284e873]
8: (()+0xb596e) [0x7f4b0284e96e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x636c8f]
10: (PaxosService::propose_pending()+0x46d) [0x4dee3d]
11: (MDSMonitor::tick()+0x1c62) [0x51cdd2]
12: (MDSMonitor::on_active()+0x1a) [0x512ada]
13: (PaxosService::_active()+0x31d) [0x4e067d]
14: (Context::complete(int)+0xa) [0x4b7b4a]
15: (finish_contexts(CephContext*, std::list >&, int)+0x95) [0x4ba5a5]
16: (Paxos::handle_last(MMonPaxos*)+0xbef) [0x4da92f]
17: (Paxos::dispatch(PaxosServiceMessage*)+0x26b) [0x4dad8b]
18: (Monitor::_ms_dispatch(Message*)+0x149f) [0x4b310f]
19: (Monitor::ms_dispatch(Message*)+0x32) [0x4c9d12]
20: (DispatchQueue::entry()+0x341) [0x698da1]
21: (DispatchQueue::DispatchThread::entry()+0xd) [0x626c5d]
22: (()+0x7e9a) [0x7f4b03320e9a]
23: (clone()+0x6d) [0x7f4b01fbbcbd]

The complete log is at: http://goo.gl/UmNs3


Does anyone recognize what's going on?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
c

Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Matthew Roy
On 04/18/2013 06:03 PM, Joao Eduardo Luis wrote:
> 
> There's definitely some command messages being forwarded, but AFAICT
> they're being forwarded to the monitor, not by the monitor, which by
> itself is a good omen towards the monitor being the leader :-)
> 
> In any case, nothing in the trace's code path indicates we could be a
> peon, unless the monitor itself believed to be the leader.  If you take
> a closer look, you'll see that we come from 'handle_last()', which is
> bound to happen only on the leader (we'll assert otherwise).  For the
> monitor to be receiving these messages it must mean the peons believe
> him to be the leader -- or we have so many bugs going around that it's
> just madness!
> 
> In all seriousness, when I was chasing after this bug, Matthew sent me
> his logs with higher debug levels -- no craziness going around :-)
> 
>   -Joao
> 

Is there a way to tell who's being "denied"? Even if it's just log
pollution I'd like to know which client is misconfigured. There are
similar messages in all the mon logs:

mon.a:
2013-04-18 18:16:51.254378 7fc7c6d10700  1 --
[2001:470:8:dd9::20]:6789/0 --> [2001:470:8:dd9::21]:6789/0 --
route(mon_command_ack([auth,get-or-create,client.admin,mon,allow
*,osd,allow *,mds,allow]=-13 access denied v775211) v1 tid 8867608) v2
-- ?+0 0x7fc61a18b160 con 0x253f700


mon.b:
2013-04-18 18:16:49.670758 7f37c7afa700 20 --
[2001:470:8:dd9::21]:6789/0 >> [2001:470:8:dd9::21]:0/22372
pipe(0x7f383c070b70 sd=90 :6789 s=2 pgs=1 cs=1 l=1).writer encoding 7
0x7f37f49876a0
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v775209) v1

(mon.c was removed since the first log file in the thread)

mon.d:
2013-04-18 18:16:51.304897 7f927d40f700  1 --
[2001:470:8:dd9:7271:bcff:febd:e398]:6789/0 --> client.?
[2001:470:8:dd9::21]:0/26333 --
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v775211) v1 -- ?+0 0x7f923c0230a0

The spacing on these messages is about 0.001s so there's a lot of them
going around. All these systems are running 0.60-472-g327002e

Matthew




-- 
Matthew
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Gregory Farnum
There's a little bit of python called ceph-create-keys, which is
invoked by the upstart scripts. You can kill the running processes,
and edit them out of the scripts, without direct harm. (Their purpose
is to create some standard keys which the newer deployment tools rely
on to do things like create OSDs, etc.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Apr 18, 2013 at 3:20 PM, Matthew Roy  wrote:
> On 04/18/2013 06:03 PM, Joao Eduardo Luis wrote:
>>
>> There's definitely some command messages being forwarded, but AFAICT
>> they're being forwarded to the monitor, not by the monitor, which by
>> itself is a good omen towards the monitor being the leader :-)
>>
>> In any case, nothing in the trace's code path indicates we could be a
>> peon, unless the monitor itself believed to be the leader.  If you take
>> a closer look, you'll see that we come from 'handle_last()', which is
>> bound to happen only on the leader (we'll assert otherwise).  For the
>> monitor to be receiving these messages it must mean the peons believe
>> him to be the leader -- or we have so many bugs going around that it's
>> just madness!
>>
>> In all seriousness, when I was chasing after this bug, Matthew sent me
>> his logs with higher debug levels -- no craziness going around :-)
>>
>>   -Joao
>>
>
> Is there a way to tell who's being "denied"? Even if it's just log
> pollution I'd like to know which client is misconfigured. There are
> similar messages in all the mon logs:
>
> mon.a:
> 2013-04-18 18:16:51.254378 7fc7c6d10700  1 --
> [2001:470:8:dd9::20]:6789/0 --> [2001:470:8:dd9::21]:6789/0 --
> route(mon_command_ack([auth,get-or-create,client.admin,mon,allow
> *,osd,allow *,mds,allow]=-13 access denied v775211) v1 tid 8867608) v2
> -- ?+0 0x7fc61a18b160 con 0x253f700
>
>
> mon.b:
> 2013-04-18 18:16:49.670758 7f37c7afa700 20 --
> [2001:470:8:dd9::21]:6789/0 >> [2001:470:8:dd9::21]:0/22372
> pipe(0x7f383c070b70 sd=90 :6789 s=2 pgs=1 cs=1 l=1).writer encoding 7
> 0x7f37f49876a0
> mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
> *,mds,allow]=-13 access denied v775209) v1
>
> (mon.c was removed since the first log file in the thread)
>
> mon.d:
> 2013-04-18 18:16:51.304897 7f927d40f700  1 --
> [2001:470:8:dd9:7271:bcff:febd:e398]:6789/0 --> client.?
> [2001:470:8:dd9::21]:0/26333 --
> mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
> *,mds,allow]=-13 access denied v775211) v1 -- ?+0 0x7f923c0230a0
>
> The spacing on these messages is about 0.001s so there's a lot of them
> going around. All these systems are running 0.60-472-g327002e
>
> Matthew
>
>
>
>
> --
> Matthew
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA

2013-04-18 Thread Gandalf Corvotempesta
the user land preloader library like sdp isn't enough?
Is the kernel version needed just for librbd?
Il giorno 18/apr/2013 23:48, "Mark Nelson"  ha
scritto:

> On 04/18/2013 04:46 PM, Gandalf Corvotempesta wrote:
>
>> 2013/4/18 Mark Nelson :
>>
>>> SDP is deprecated:
>>>
>>> http://comments.gmane.org/**gmane.network.openfabrics.**enterprise/5371
>>>
>>> rsockets is the future I think.
>>>
>>
>> I don't know rsockets. Any plans about support for this or are they
>> "transparent" like SDP?
>>
>>
> I think we'll probably be investigating to see if rsockets could be a good
> intermediate solution before tackling RDMA.  The big question still is what
> to do since there isn't a kernel implementation (yet).
>
> Mark
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor Access Denied message to itself?

2013-04-18 Thread Mike Dawson

Greg,

Looks like Sage has a fix for this problem. In case it matters, I have 
seen a few cases that conflict with your notes in this thread and the 
bug report.


I have seen the bug exclusively on new Ceph installs (without upgrading 
from bobtail), so it is not isolated to upgrades.


Further, I have seen it on test deployments with a single monitor, so it 
doesn't seem to be limited to deployments with a leader and followers.


Thanks getting this bug moving forward.

Thanks,
Mike


On 4/18/2013 6:23 PM, Gregory Farnum wrote:

There's a little bit of python called ceph-create-keys, which is
invoked by the upstart scripts. You can kill the running processes,
and edit them out of the scripts, without direct harm. (Their purpose
is to create some standard keys which the newer deployment tools rely
on to do things like create OSDs, etc.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Apr 18, 2013 at 3:20 PM, Matthew Roy  wrote:

On 04/18/2013 06:03 PM, Joao Eduardo Luis wrote:


There's definitely some command messages being forwarded, but AFAICT
they're being forwarded to the monitor, not by the monitor, which by
itself is a good omen towards the monitor being the leader :-)

In any case, nothing in the trace's code path indicates we could be a
peon, unless the monitor itself believed to be the leader.  If you take
a closer look, you'll see that we come from 'handle_last()', which is
bound to happen only on the leader (we'll assert otherwise).  For the
monitor to be receiving these messages it must mean the peons believe
him to be the leader -- or we have so many bugs going around that it's
just madness!

In all seriousness, when I was chasing after this bug, Matthew sent me
his logs with higher debug levels -- no craziness going around :-)

   -Joao



Is there a way to tell who's being "denied"? Even if it's just log
pollution I'd like to know which client is misconfigured. There are
similar messages in all the mon logs:

mon.a:
2013-04-18 18:16:51.254378 7fc7c6d10700  1 --
[2001:470:8:dd9::20]:6789/0 --> [2001:470:8:dd9::21]:6789/0 --
route(mon_command_ack([auth,get-or-create,client.admin,mon,allow
*,osd,allow *,mds,allow]=-13 access denied v775211) v1 tid 8867608) v2
-- ?+0 0x7fc61a18b160 con 0x253f700


mon.b:
2013-04-18 18:16:49.670758 7f37c7afa700 20 --
[2001:470:8:dd9::21]:6789/0 >> [2001:470:8:dd9::21]:0/22372
pipe(0x7f383c070b70 sd=90 :6789 s=2 pgs=1 cs=1 l=1).writer encoding 7
0x7f37f49876a0
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v775209) v1

(mon.c was removed since the first log file in the thread)

mon.d:
2013-04-18 18:16:51.304897 7f927d40f700  1 --
[2001:470:8:dd9:7271:bcff:febd:e398]:6789/0 --> client.?
[2001:470:8:dd9::21]:0/26333 --
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v775211) v1 -- ?+0 0x7f923c0230a0

The spacing on these messages is about 0.001s so there's a lot of them
going around. All these systems are running 0.60-472-g327002e

Matthew




--
Matthew

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com