[ceph-users] hammer - lost object after just one OSD failure?

2016-05-04 Thread Nikola Ciprich
Hi,

I was doing some performance tuning on test cluster of just 2
nodes (each 10 OSDs). I have test pool of 2 replicas (size=2, min_size=2)

then one of OSD crashed due to failing harddrive. All remaining OSDs were
fine, but health status reported one lost object..

here's detail:

"recovery_state": [
{
"name": "Started\/Primary\/Active",
"enter_time": "2016-05-04 07:59:10.706866",
"might_have_unfound": [
{
"osd": "0",
"status": "osd is down"
},
{
"osd": "10",
"status": "already probed"
}
],


it was no important data, so  I just discarded it as I don't need
to recover it, but now I'm wondering what is the cause of all this..

I have min_size set to 2 and I though that writes are confirmed after
they reach all target OSD journals, no? Is there something specific I should
check? Maybe I have some bug in configuration? Or how else could this object
be lost?

I'd be grateful for any info

br

nik





-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpewxsGEVgLj.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling POSIX locking semantics for CephFS

2016-05-04 Thread Yan, Zheng
On Wed, May 4, 2016 at 3:39 AM, Burkhard Linke
 wrote:
> Hi,
>
> On 03.05.2016 18:39, Gregory Farnum wrote:
>>
>> On Tue, May 3, 2016 at 9:30 AM, Burkhard Linke
>>  wrote:
>>>
>>> Hi,
>>>
>>> we have a number of legacy applications that do not cope well with the
>>> POSIX
>>> locking semantics in CephFS due to missing locking support (e.g. flock
>>> syscalls). We are able to fix some of these applications, but others are
>>> binary only.
>>>
>>> Is it possible to disable POSIX locking completely in CephFS (either
>>> kernel
>>> client or ceph-fuse)?
>>
>> I'm confused. CephFS supports all of these — although some versions of
>> FUSE don't; you need a new-ish kernel.
>>
>> So are you saying that
>> 1) in your setup, it doesn't support both fcntl and flock,
>> 2) that some of your applications don't do well under that scenario?
>>
>> I don't really see how it's safe for you to just disable the
>> underlying file locking in an application which depends on it. You may
>> need to upgrade enough that all file locks are supported.
>
>
> The application in question does a binary search in a large data file (~75
> GB), which is stored on CephFS. It uses open and mmap without any further
> locking controls (neither fcntl nor flock). The performance was very poor
> with CephFS (Ubuntu Trusty 4.4 backport kernel from Xenial and ceph-fuse)
> compared to the same application with a NFS based storage. I didn't had the
> time to dig further into the kernel implementation yet, but I assume that
> the root cause is locking pages accessed via the memory mapped file. Adding
> a simple flock syscall for marking the data file globally as shared solved
> the problem for us, reducing the overall runtime from nearly 2 hours to 5
> minutes (and thus comparable to the NFS control case). The application runs
> on our HPC cluster, so several 100 instances may access the same data file
> at once.
>
> We have other applications that were written without locking support and
> that do not perform very well with CephFS. There was a thread in February
> with a short discussion about CephFS mmap performance
> (http://article.gmane.org/gmane.comp.file-systems.ceph.user/27501). As
> pointed out in that thread, the problem is not only related to mmap itself,
> but also to the need to implement a proper invalidation. We cannot fix this
> for all our applications due to the lack of man power and the lack of source
> code in some cases. We either have to find a way to make them work with
> CephFS, or use a different setup, e.g. an extra NFS based mount point with a
> re-export of CephFS. I would like to avoid the later solution...
>
> Disabling the POSIX semantics and having a fallback to a more NFS-like
> semantic without guarantees is a setback, but probably the easier way (if it
> is possible at all). Most data accessed by these applications is read only,
> so complex locking is not necessary in these cases.


see http://tracker.ceph.com/issues/15502. Maybe it's related to this issue.

Regards
Yan, Zheng

>
> Regards,
> Burkhard
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implications of using directory as Ceph OSD devices

2016-05-04 Thread Oliver Dzombic
Hi Vincenzo,

theoretically you might also be able to use a directory.

But dont forget, that ceph relies on this xattr stuff from the FS.
XFS is currently the most save choice to choose.

Also you will need a journal. This is usually a symblink to a device
inside the osd directory.

My advice to you is, that you should first, regulary, create a regular
device/disk for osd and see how this looks like.

Also, you should consider, that udev is used for ceph disk activation.
So you would have to modify this udev rules aswell.

All in all its a lot of hacking you need to do.

Unless you have a really serious reason to do so, i wont do it.

In case of upgrades, this whole will also clash without end.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 03.05.2016 um 22:00 schrieb Vincenzo Pii:
> ceph-disk can prepare a disk a partition or a directory to be used as a
> device.
> 
> What are the implications and limits of using a directory?
> Can it be used both for journal and storage?
> What file system should the directory exist on?
> 
> 
> Vincenzo Pii | TERALYTICS
> *DevOps Engineer
> *
> 
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland 
> phone: +41 (0) 79 191 11 08
> email: vincenzo@teralytics.net
> 
> www.teralytics.net
> 
> 
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
> Yann de Vries
> 
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at
> once if you think that it may not be intended for you and delete it
> immediately. 
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling POSIX locking semantics for CephFS

2016-05-04 Thread Burkhard Linke

Hi,

On 05/04/2016 09:15 AM, Yan, Zheng wrote:

On Wed, May 4, 2016 at 3:39 AM, Burkhard Linke
 wrote:

Hi,

On 03.05.2016 18:39, Gregory Farnum wrote:

On Tue, May 3, 2016 at 9:30 AM, Burkhard Linke
 wrote:

Hi,

we have a number of legacy applications that do not cope well with the
POSIX
locking semantics in CephFS due to missing locking support (e.g. flock
syscalls). We are able to fix some of these applications, but others are
binary only.

Is it possible to disable POSIX locking completely in CephFS (either
kernel
client or ceph-fuse)?

I'm confused. CephFS supports all of these — although some versions of
FUSE don't; you need a new-ish kernel.

So are you saying that
1) in your setup, it doesn't support both fcntl and flock,
2) that some of your applications don't do well under that scenario?

I don't really see how it's safe for you to just disable the
underlying file locking in an application which depends on it. You may
need to upgrade enough that all file locks are supported.


The application in question does a binary search in a large data file (~75
GB), which is stored on CephFS. It uses open and mmap without any further
locking controls (neither fcntl nor flock). The performance was very poor
with CephFS (Ubuntu Trusty 4.4 backport kernel from Xenial and ceph-fuse)
compared to the same application with a NFS based storage. I didn't had the
time to dig further into the kernel implementation yet, but I assume that
the root cause is locking pages accessed via the memory mapped file. Adding
a simple flock syscall for marking the data file globally as shared solved
the problem for us, reducing the overall runtime from nearly 2 hours to 5
minutes (and thus comparable to the NFS control case). The application runs
on our HPC cluster, so several 100 instances may access the same data file
at once.

We have other applications that were written without locking support and
that do not perform very well with CephFS. There was a thread in February
with a short discussion about CephFS mmap performance
(http://article.gmane.org/gmane.comp.file-systems.ceph.user/27501). As
pointed out in that thread, the problem is not only related to mmap itself,
but also to the need to implement a proper invalidation. We cannot fix this
for all our applications due to the lack of man power and the lack of source
code in some cases. We either have to find a way to make them work with
CephFS, or use a different setup, e.g. an extra NFS based mount point with a
re-export of CephFS. I would like to avoid the later solution...

Disabling the POSIX semantics and having a fallback to a more NFS-like
semantic without guarantees is a setback, but probably the easier way (if it
is possible at all). Most data accessed by these applications is read only,
so complex locking is not necessary in these cases.


see http://tracker.ceph.com/issues/15502. Maybe it's related to this issue.
We are using Ceph release 0.94.6, so the performance problems are 
probably not related. The page cache is also keep populated after an 
application terminates:


# dd if=test of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 109.008 s, 98.5 MB/s
# dd if=test of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 9.24535 s, 1.2 GB/s


How does CephFS handle locking in case of missing explicit locking 
control (e.g. flock / fcntl)? And what's the default of mmap'ed memory 
access in that case?


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling POSIX locking semantics for CephFS

2016-05-04 Thread Yan, Zheng
On Wed, May 4, 2016 at 4:51 PM, Burkhard Linke
 wrote:
> Hi,
>
>
> How does CephFS handle locking in case of missing explicit locking control
> (e.g. flock / fcntl)? And what's the default of mmap'ed memory access in
> that case?
>

Nothing special. Actually, I have no idea why using flock improves
performance. Could you please enable debug and send  the log to use.

run following commands while your application (without flock) is
running and send the log to us.

ceph daemon client.xxx config set debug_client 20
sleep 30
ceph daemon client.xxx config set debug_client 0





>
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW Jewel upgrade: realms and default .rgw.root pool?

2016-05-04 Thread Ben Morrice

Hello,

We have been running infernalis with RGW in a federated configuration.

I want to upgrade to Jewel however i'm confused by the new configuration 
requirements of realms and the default .rgw.root pool.


In our infernalis configuration, for the master region/zone I have the 
following in ceph.conf:


  rgw region = bbp-dev
  rgw region root pool = .bbp-dev.rgw.root
  rgw zone = bbp-dev-master
  rgw zone root pool = .bbp-dev-master.rgw.root

For the upgrade to Jewel, I stopped RGW, upgraded the RPMs and changed 
'rgw region root pool' to be 'rgw zonegroup root pool' (as per the 
updated federated configuration documentation).


When I start RGW I get errors relating to a realm not being present, and 
the pool .rgw.root is created automatically which did not exist before. 
(please see below for some debug log)


The federated RGW documentation page does not have any information on 
realms/periods.


Is it now expected in Jewel to always have a .rgw.root pool for the 
storage of realm/period data?
Is my upgrade logic correct, allowing ceph to create a default 
realm/period in .rgw.root or should I create these manually?


I would like to also move this federated configuration to a multisite 
configuration, however at this point in time I am just focusing on 
upgrading ceph to Jewel and maintaining the federated configuration.


Thanks!

Cheers,

Ben


[root@bbpcb051 ceph]# /usr/bin/radosgw -d --cluster ceph --name 
client.radosgw.gateway --setuser ceph --setgroup ceph
2016-05-04 14:00:11.782631 7f9371d7da40  0 set uid:gid to 167:167 
(ceph:ceph)
2016-05-04 14:00:11.782648 7f9371d7da40  0 ceph version 10.2.0 
(3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process radosgw, pid 7277
2016-05-04 14:00:11.817861 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e56aa0 obj=.rgw.root:default.realm state=0x7f93732a2f88 
s->prefetch_data=0
2016-05-04 14:00:11.818037 7f933effd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2016-05-04 14:00:11.819408 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e55fa0 obj=.rgw.root:default.realm state=0x7f93732a2f88 
s->prefetch_data=0
2016-05-04 14:00:11.820019 7f9371d7da40 10 could not read realm id: (2) 
No such file or directory
2016-05-04 14:00:11.827571 7f9371d7da40 20 RGWRados::pool_iterate: got 
region_info.bbp-dev
2016-05-04 14:00:11.847367 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e561d0 obj=.bbp-dev.rgw.root:region_info.bbp-dev 
state=0x7f93732af838 s->prefetch_data=0
2016-05-04 14:00:11.848327 7f9371d7da40 20 get_system_obj_state: 
s->obj_tag was set empty
2016-05-04 14:00:11.848334 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e561d0 obj=.bbp-dev.rgw.root:region_info.bbp-dev 
state=0x7f93732af838 s->prefetch_data=0

2016-05-04 14:00:11.848336 7f9371d7da40 20 rados->read ofs=0 len=524288
2016-05-04 14:00:11.849278 7f9371d7da40 20 rados->read r=0 bl.length=212
2016-05-04 14:00:11.849342 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e55d60 obj=.rgw.root:realms_names.bbp-dev.bbp-dev-master 
state=0x7f93732af838 s->prefetch_data=0
2016-05-04 14:00:13.884348 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e55ae0 
obj=.rgw.root:periods.21305dac-ee64-42ea-87cf-ee5bb3b42d40.latest_epoch 
state=0x7f93732a43a8 s->prefetch_data=0
2016-05-04 14:00:13.911815 7f9371d7da40  0 error read_lastest_epoch 
.rgw.root:periods.21305dac-ee64-42ea-87cf-ee5bb3b42d40.latest_epoch
2016-05-04 14:00:13.921753 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e56170 obj=.rgw.root:default.realm state=0x7f93732b0e18 
s->prefetch_data=0
2016-05-04 14:00:13.922301 7f9371d7da40 20 get_system_obj_state: 
s->obj_tag was set empty
2016-05-04 14:00:13.922308 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e56170 obj=.rgw.root:default.realm state=0x7f93732b0e18 
s->prefetch_data=0

2016-05-04 14:00:13.922311 7f9371d7da40 20 rados->read ofs=0 len=524288
2016-05-04 14:00:13.922806 7f9371d7da40 20 rados->read r=0 bl.length=42
2016-05-04 14:00:13.922823 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e561d0 
obj=.rgw.root:realms.1d3f123fa1f9f2f2f49d119c50590d63 
state=0x7f93732b0e18 s->prefetch_data=0
2016-05-04 14:00:13.923329 7f9371d7da40 20 get_system_obj_state: 
s->obj_tag was set empty
2016-05-04 14:00:13.923335 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e561d0 
obj=.rgw.root:realms.1d3f123fa1f9f2f2f49d119c50590d63 
state=0x7f93732b0e18 s->prefetch_data=0

2016-05-04 14:00:13.923337 7f9371d7da40 20 rados->read ofs=0 len=524288
2016-05-04 14:00:13.923808 7f9371d7da40 20 rados->read r=0 bl.length=118
2016-05-04 14:00:13.923826 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e55ff0 
obj=.rgw.root:realms.1d3f123fa1f9f2f2f49d119c50590d63 
state=0x7f93732b0e18 s->prefetch_data=0
2016-05-04 14:00:13.924339 7f9371d7da40 20 get_system_obj_state: 
s->obj_tag was set empty
2016-05-04 14:00:13.924344 7f9371d7da40 20 get_system_obj_state: 
rctx=0x7ffd86e55ff0 
obj=.rgw.root:realms.1d3f123fa1f9f2f2f49d119c50590d63 
state=0x7f93732b0e18 s->prefetch_data=0

2016-0

Re: [ceph-users] Status of ceph-docker

2016-05-04 Thread Daniel Gryniewicz

On 05/03/2016 04:17 PM, Vincenzo Pii wrote:

https://github.com/ceph/ceph-docker

Is someone using ceph-docker in production or the project is meant more
for development and experimentation?

Vincenzo Pii| TERALYTICS
*DevOps Engineer
*


I'm not aware of anyone currently using it in production, but it is 
being used as a base for a downstream RHCS containerized release, so 
there will be production containerized Ceph deployed.


Daniel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer - lost object after just one OSD failure?

2016-05-04 Thread Gregory Farnum
On Wed, May 4, 2016 at 12:00 AM, Nikola Ciprich
 wrote:
> Hi,
>
> I was doing some performance tuning on test cluster of just 2
> nodes (each 10 OSDs). I have test pool of 2 replicas (size=2, min_size=2)
>
> then one of OSD crashed due to failing harddrive. All remaining OSDs were
> fine, but health status reported one lost object..
>
> here's detail:
>
> "recovery_state": [
> {
> "name": "Started\/Primary\/Active",
> "enter_time": "2016-05-04 07:59:10.706866",
> "might_have_unfound": [
> {
> "osd": "0",
> "status": "osd is down"
> },
> {
> "osd": "10",
> "status": "already probed"
> }
> ],
>
>
> it was no important data, so  I just discarded it as I don't need
> to recover it, but now I'm wondering what is the cause of all this..
>
> I have min_size set to 2 and I though that writes are confirmed after
> they reach all target OSD journals, no? Is there something specific I should
> check? Maybe I have some bug in configuration? Or how else could this object
> be lost?

Is OSD 0 the one which had a failing hard drive? And OSD 10 is
supposed to be fine?

In general what you're saying does make it sound like something under
the Ceph code lost objects, but if one of those OSDs has never had a
problem I'm not sure what it could be.

(The most common failure mode is power loss while the user has
barriers turned off, or a RAID card misconfigured, or similar.)
-Greg

>
> I'd be grateful for any info
>
> br
>
> nik
>
>
>
>
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling POSIX locking semantics for CephFS

2016-05-04 Thread Gregory Farnum
On Wed, May 4, 2016 at 2:16 AM, Yan, Zheng  wrote:
> On Wed, May 4, 2016 at 4:51 PM, Burkhard Linke
>  wrote:
>> Hi,
>>
>>
>> How does CephFS handle locking in case of missing explicit locking control
>> (e.g. flock / fcntl)? And what's the default of mmap'ed memory access in
>> that case?
>>
>
> Nothing special. Actually, I have no idea why using flock improves
> performance. Could you please enable debug and send  the log to use.

Okay, so it sounds like this isn't so much flock file locking as you
added a syscall telling it not to worry about synchronization, and now
you want a way to disable our consistency semantics on . Exactly what change did you make to your application, can you
share the key syscall?

Programmatically you can use the lazyIO flags we have, but I can't
offhand think of anything you can specify per-mount or similar. That's
an interesting request, hmm Zheng, Sage, any thoughts?
-Greg

>
> run following commands while your application (without flock) is
> running and send the log to us.
>
> ceph daemon client.xxx config set debug_client 20
> sleep 30
> ceph daemon client.xxx config set debug_client 0
>
>
>
>
>
>>
>> Regards,
>> Burkhard
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer - lost object after just one OSD failure?

2016-05-04 Thread Nikola Ciprich
Hi Gregory,

thanks for the reply.

> 
> Is OSD 0 the one which had a failing hard drive? And OSD 10 is
> supposed to be fine?

yes, OSD 0 crashed due to disk errors, rest of the cluster was without
problems, no crash, no restarts.. that's why it scared me a bit..

pity I purged lost placement groups, maybe we could have digged some
more debug info... I'll torture and watch the cluster carefully and report if
something similar happens again.. I suppose we can't do much more till
then...

BR

nik
> 
> In general what you're saying does make it sound like something under
> the Ceph code lost objects, but if one of those OSDs has never had a
> problem I'm not sure what it could be.
> 
> (The most common failure mode is power loss while the user has
> barriers turned off, or a RAID card misconfigured, or similar.)
> -Greg
> 
> >
> > I'd be grateful for any info
> >
> > br
> >
> > nik
> >
> >
> >
> >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpoSeBEBZpA7.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Michael Kuriger
I’m running CentOS 7.2.  I upgraded one server from hammer to jewel.   I cannot 
get ceph to start using these new systems scripts.  Can anyone help?

I tried to enable ceph-osd@.service by creating symlinks manually.


# systemctl list-unit-files|grep ceph

ceph-create-keys@.service  static

ceph-disk@.service static

ceph-mds@.service  disabled

ceph-mon@.service  disabled

ceph-osd@.service  enabled

ceph-mds.targetdisabled

ceph-mon.targetdisabled

ceph-osd.targetenabled

ceph.targetenabled



# systemctl start ceph.target


# systemctl status ceph.target

● ceph.target - ceph target allowing to start/stop all ceph*@.service instances 
at once

   Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor preset: 
disabled)

   Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 6s ago


May 04 08:53:30  systemd[1]: Reached target ceph target allowing to start/stop 
all ceph*@.service instances at once.

May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all 
ceph*@.service instances at once.

May 04 08:57:32  systemd[1]: Reached target ceph target allowing to start/stop 
all ceph*@.service instances at once.


# systemctl status ceph-osd.target

● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service 
instances at once

   Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled; vendor 
preset: disabled)

   Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 20s ago


May 04 08:53:30  systemd[1]: Reached target ceph target allowing to start/stop 
all ceph-osd@.service instances at once.

May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all 
ceph-osd@.service instances at once.


# systemctl status ceph-osd@.service

Failed to get properties: Unit name ceph-osd@.service is not valid.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Vasu Kulkarni
sadly there are still some issues with jewel/master branch for centos
systemctl service,
As a workaround if you run "systemctl status" and look at the top most
service name in the ceph-osd service tree and use that to stop/start
it should work.


On Wed, May 4, 2016 at 9:00 AM, Michael Kuriger  wrote:
> I’m running CentOS 7.2.  I upgraded one server from hammer to jewel.   I
> cannot get ceph to start using these new systems scripts.  Can anyone help?
>
> I tried to enable ceph-osd@.service by creating symlinks manually.
>
> # systemctl list-unit-files|grep ceph
>
> ceph-create-keys@.service  static
>
> ceph-disk@.service static
>
> ceph-mds@.service  disabled
>
> ceph-mon@.service  disabled
>
> ceph-osd@.service  enabled
>
> ceph-mds.targetdisabled
>
> ceph-mon.targetdisabled
>
> ceph-osd.targetenabled
>
> ceph.targetenabled
>
>
>
> # systemctl start ceph.target
>
>
> # systemctl status ceph.target
>
> ● ceph.target - ceph target allowing to start/stop all ceph*@.service
> instances at once
>
>Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor
> preset: disabled)
>
>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 6s ago
>
>
> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
> start/stop all ceph*@.service instances at once.
>
> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
> ceph*@.service instances at once.
>
> May 04 08:57:32  systemd[1]: Reached target ceph target allowing to
> start/stop all ceph*@.service instances at once.
>
>
> # systemctl status ceph-osd.target
>
> ● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service
> instances at once
>
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled; vendor
> preset: disabled)
>
>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 20s ago
>
>
> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
> start/stop all ceph-osd@.service instances at once.
>
> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
> ceph-osd@.service instances at once.
>
>
> # systemctl status ceph-osd@.service
>
> Failed to get properties: Unit name ceph-osd@.service is not valid.
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Michael Kuriger
How are others starting ceph services?  Am I the only person trying to install 
jewel on CentOS 7?
Unfortunately, systemctl status does not list any “ceph” services at all.

 








On 5/4/16, 9:37 AM, "Vasu Kulkarni"  wrote:

>sadly there are still some issues with jewel/master branch for centos
>systemctl service,
>As a workaround if you run "systemctl status" and look at the top most
>service name in the ceph-osd service tree and use that to stop/start
>it should work.
>
>
>On Wed, May 4, 2016 at 9:00 AM, Michael Kuriger  wrote:
>> I’m running CentOS 7.2.  I upgraded one server from hammer to jewel.   I
>> cannot get ceph to start using these new systems scripts.  Can anyone help?
>>
>> I tried to enable ceph-osd@.service by creating symlinks manually.
>>
>> # systemctl list-unit-files|grep ceph
>>
>> ceph-create-keys@.service  static
>>
>> ceph-disk@.service static
>>
>> ceph-mds@.service  disabled
>>
>> ceph-mon@.service  disabled
>>
>> ceph-osd@.service  enabled
>>
>> ceph-mds.targetdisabled
>>
>> ceph-mon.targetdisabled
>>
>> ceph-osd.targetenabled
>>
>> ceph.targetenabled
>>
>>
>>
>> # systemctl start ceph.target
>>
>>
>> # systemctl status ceph.target
>>
>> ● ceph.target - ceph target allowing to start/stop all ceph*@.service
>> instances at once
>>
>>Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor
>> preset: disabled)
>>
>>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 6s ago
>>
>>
>> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
>> start/stop all ceph*@.service instances at once.
>>
>> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
>> ceph*@.service instances at once.
>>
>> May 04 08:57:32  systemd[1]: Reached target ceph target allowing to
>> start/stop all ceph*@.service instances at once.
>>
>>
>> # systemctl status ceph-osd.target
>>
>> ● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service
>> instances at once
>>
>>Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled; vendor
>> preset: disabled)
>>
>>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 20s ago
>>
>>
>> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
>> start/stop all ceph-osd@.service instances at once.
>>
>> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
>> ceph-osd@.service instances at once.
>>
>>
>> # systemctl status ceph-osd@.service
>>
>> Failed to get properties: Unit name ceph-osd@.service is not valid.
>>
>>
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=CwIFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=ha3XvQGcc5Yztz98b7hb8pYQo14dcIiYxfOoMzyUM00&s=VdVOtGV4JQUKyQDDC_QYn1-7wBcSh-eYwx_cCSQWlQk&e=
>>  
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Vasu Kulkarni
I think this is actually fixed in master, probaby not yet backported
to jewel,  systemctl status should list ceph services unless there is
some other issue with your node

ex output:

   └─system.slice
 ├─system-ceph\x2dosd.slice
 │ └─ceph-osd@0.service
 │   └─22652 /usr/bin/ceph-osd -f --cluster ceph --id 0
--setuser ceph --setgroup ceph



This is on latest branch though


[ubuntu@mira078 cd]$ sudo systemctl status ceph-osd@0.service
● ceph-osd@0.service - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled;
vendor preset: disabled)
   Active: active (running) since Wed 2016-05-04 16:57:37 UTC; 4min 22s ago
  Process: 23074 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 23125 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
   └─23125 /usr/bin/ceph-osd -f --cluster ceph --id 0
--setuser ceph --setgroup ceph

May 04 16:57:37 mira078 systemd[1]: Starting Ceph object storage daemon...
May 04 16:57:37 mira078 ceph-osd-prestart.sh[23074]: create-or-move
updated item name 'osd.0' weight 0.9044 at location
{host=mira078,root=default} to crush map
May 04 16:57:37 mira078 systemd[1]: Started Ceph object storage daemon.
May 04 16:57:37 mira078 ceph-osd[23125]: starting osd.0 at :/0
osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
May 04 16:57:37 mira078 ceph-osd[23125]: 2016-05-04 16:57:37.892615
7f110eeff800 -1 osd.0 9 log_to_monitors {default=true}
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$ sudo systemctl stop ceph-osd@0.service
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$ sudo systemctl status ceph-osd@0.service
● ceph-osd@0.service - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled;
vendor preset: disabled)
   Active: inactive (dead) since Wed 2016-05-04 17:02:09 UTC; 2s ago
  Process: 23125 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
--id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 23074 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 23125 (code=exited, status=0/SUCCESS)

May 04 16:57:37 mira078 systemd[1]: Starting Ceph object storage daemon...
May 04 16:57:37 mira078 ceph-osd-prestart.sh[23074]: create-or-move
updated item name 'osd.0' weight 0.9044 at location
{host=mira078,root=default} to crush map
May 04 16:57:37 mira078 systemd[1]: Started Ceph object storage daemon.
May 04 16:57:37 mira078 ceph-osd[23125]: starting osd.0 at :/0
osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
May 04 16:57:37 mira078 ceph-osd[23125]: 2016-05-04 16:57:37.892615
7f110eeff800 -1 osd.0 9 log_to_monitors {default=true}
May 04 17:02:06 mira078 systemd[1]: Stopping Ceph object storage daemon...
May 04 17:02:06 mira078 ceph-osd[23125]: 2016-05-04 17:02:06.972780
7f10e819b700 -1 osd.0 12 *** Got signal Terminated ***
May 04 17:02:07 mira078 ceph-osd[23125]: 2016-05-04 17:02:07.027192
7f10e819b700 -1 osd.0 12 shutdown
May 04 17:02:09 mira078 systemd[1]: Stopped Ceph object storage daemon.
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$ sudo systemctl start ceph-osd@0.service
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$ sudo systemctl status ceph-osd@0.service
● ceph-osd@0.service - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled;
vendor preset: disabled)
   Active: active (running) since Wed 2016-05-04 17:02:19 UTC; 4s ago
  Process: 23283 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 23335 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
   └─23335 /usr/bin/ceph-osd -f --cluster ceph --id 0
--setuser ceph --setgroup ceph

May 04 17:02:18 mira078 systemd[1]: Starting Ceph object storage daemon...
May 04 17:02:19 mira078 ceph-osd-prestart.sh[23283]: create-or-move
updated item name 'osd.0' weight 0.9044 at location
{host=mira078,root=default} to crush map
May 04 17:02:19 mira078 systemd[1]: Started Ceph object storage daemon.
May 04 17:02:19 mira078 ceph-osd[23335]: starting osd.0 at :/0
osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
May 04 17:02:19 mira078 ceph-osd[23335]: 2016-05-04 17:02:19.503305
7fd5731d4800 -1 osd.0 13 log_to_monitors {default=true}
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$
[ubuntu@mira078 cd]$ ps -eaf | grep ceph
ceph 22420 1  0 16:55 ?00:00:00 /usr/bin/ceph-mon -f
--cluster ceph --id mira078 --setuser ceph --setgroup ceph
ceph 23335 1  1 17:02 ?00:00:00 /usr/bin/ceph-osd -f
--cluster ceph --id 0 --setuser ceph --setgroup ceph

On Wed, May 4, 2016 at 9:58 AM, Michael Kuriger  wrote:
> How are others starting ceph services?  Am I the on

Re: [ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Benjeman Meekhof
Hi Michael,

Systemctl pattern for OSD with Infernalis or higher is 'systemctl
start ceph-osd@'  (or status, restart)

It will start OSD in default cluster 'ceph' or other cluster if you
have set 'CLUSTER=' in /etc/sysconfig/ceph

If by chance you have 2 clusters on the same hardware you'll have to
manually create separate systemd unit files in /usr/lib/systemd/system
like '-osd@.service' edited to have the 2nd cluster name, create
a separate '-osd.target' in same dir, and symlink in
/etc/systemd/system/-osd.target.wants.  I don't know if there
might be another way built-in but I did not see it.

To see all the ceph units configured:

systemctl -a | grep ceph

regards,
Ben



On Wed, May 4, 2016 at 12:58 PM, Michael Kuriger  wrote:
> How are others starting ceph services?  Am I the only person trying to 
> install jewel on CentOS 7?
> Unfortunately, systemctl status does not list any “ceph” services at all.
>
>
>
>
>
>
>
>
>
>
> On 5/4/16, 9:37 AM, "Vasu Kulkarni"  wrote:
>
>>sadly there are still some issues with jewel/master branch for centos
>>systemctl service,
>>As a workaround if you run "systemctl status" and look at the top most
>>service name in the ceph-osd service tree and use that to stop/start
>>it should work.
>>
>>
>>On Wed, May 4, 2016 at 9:00 AM, Michael Kuriger  wrote:
>>> I’m running CentOS 7.2.  I upgraded one server from hammer to jewel.   I
>>> cannot get ceph to start using these new systems scripts.  Can anyone help?
>>>
>>> I tried to enable ceph-osd@.service by creating symlinks manually.
>>>
>>> # systemctl list-unit-files|grep ceph
>>>
>>> ceph-create-keys@.service  static
>>>
>>> ceph-disk@.service static
>>>
>>> ceph-mds@.service  disabled
>>>
>>> ceph-mon@.service  disabled
>>>
>>> ceph-osd@.service  enabled
>>>
>>> ceph-mds.targetdisabled
>>>
>>> ceph-mon.targetdisabled
>>>
>>> ceph-osd.targetenabled
>>>
>>> ceph.targetenabled
>>>
>>>
>>>
>>> # systemctl start ceph.target
>>>
>>>
>>> # systemctl status ceph.target
>>>
>>> ● ceph.target - ceph target allowing to start/stop all ceph*@.service
>>> instances at once
>>>
>>>Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor
>>> preset: disabled)
>>>
>>>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 6s ago
>>>
>>>
>>> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
>>> start/stop all ceph*@.service instances at once.
>>>
>>> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
>>> ceph*@.service instances at once.
>>>
>>> May 04 08:57:32  systemd[1]: Reached target ceph target allowing to
>>> start/stop all ceph*@.service instances at once.
>>>
>>>
>>> # systemctl status ceph-osd.target
>>>
>>> ● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service
>>> instances at once
>>>
>>>Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled; vendor
>>> preset: disabled)
>>>
>>>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 20s ago
>>>
>>>
>>> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
>>> start/stop all ceph-osd@.service instances at once.
>>>
>>> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
>>> ceph-osd@.service instances at once.
>>>
>>>
>>> # systemctl status ceph-osd@.service
>>>
>>> Failed to get properties: Unit name ceph-osd@.service is not valid.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=CwIFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=ha3XvQGcc5Yztz98b7hb8pYQo14dcIiYxfOoMzyUM00&s=VdVOtGV4JQUKyQDDC_QYn1-7wBcSh-eYwx_cCSQWlQk&e=
>>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Michael Kuriger
I was able to hack the ceph /etc/init.d script to start my osd’s 

 

 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235








On 5/4/16, 9:58 AM, "ceph-users on behalf of Michael Kuriger" 
 wrote:

>How are others starting ceph services?  Am I the only person trying to install 
>jewel on CentOS 7?
>
>Unfortunately, systemctl status does not list any “ceph” services at all.
>
>
>
> 
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>On 5/4/16, 9:37 AM, "Vasu Kulkarni"  wrote:
>
>
>
>>sadly there are still some issues with jewel/master branch for centos
>
>>systemctl service,
>
>>As a workaround if you run "systemctl status" and look at the top most
>
>>service name in the ceph-osd service tree and use that to stop/start
>
>>it should work.
>
>>
>
>>
>
>>On Wed, May 4, 2016 at 9:00 AM, Michael Kuriger  wrote:
>
>>> I’m running CentOS 7.2.  I upgraded one server from hammer to jewel.   I
>
>>> cannot get ceph to start using these new systems scripts.  Can anyone help?
>
>>>
>
>>> I tried to enable ceph-osd@.service by creating symlinks manually.
>
>>>
>
>>> # systemctl list-unit-files|grep ceph
>
>>>
>
>>> ceph-create-keys@.service  static
>
>>>
>
>>> ceph-disk@.service static
>
>>>
>
>>> ceph-mds@.service  disabled
>
>>>
>
>>> ceph-mon@.service  disabled
>
>>>
>
>>> ceph-osd@.service  enabled
>
>>>
>
>>> ceph-mds.targetdisabled
>
>>>
>
>>> ceph-mon.targetdisabled
>
>>>
>
>>> ceph-osd.targetenabled
>
>>>
>
>>> ceph.targetenabled
>
>>>
>
>>>
>
>>>
>
>>> # systemctl start ceph.target
>
>>>
>
>>>
>
>>> # systemctl status ceph.target
>
>>>
>
>>> ● ceph.target - ceph target allowing to start/stop all ceph*@.service
>
>>> instances at once
>
>>>
>
>>>Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor
>
>>> preset: disabled)
>
>>>
>
>>>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 6s ago
>
>>>
>
>>>
>
>>> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
>
>>> start/stop all ceph*@.service instances at once.
>
>>>
>
>>> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
>
>>> ceph*@.service instances at once.
>
>>>
>
>>> May 04 08:57:32  systemd[1]: Reached target ceph target allowing to
>
>>> start/stop all ceph*@.service instances at once.
>
>>>
>
>>>
>
>>> # systemctl status ceph-osd.target
>
>>>
>
>>> ● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service
>
>>> instances at once
>
>>>
>
>>>Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled; vendor
>
>>> preset: disabled)
>
>>>
>
>>>Active: active since Wed 2016-05-04 08:53:30 PDT; 4min 20s ago
>
>>>
>
>>>
>
>>> May 04 08:53:30  systemd[1]: Reached target ceph target allowing to
>
>>> start/stop all ceph-osd@.service instances at once.
>
>>>
>
>>> May 04 08:53:30  systemd[1]: Starting ceph target allowing to start/stop all
>
>>> ceph-osd@.service instances at once.
>
>>>
>
>>>
>
>>> # systemctl status ceph-osd@.service
>
>>>
>
>>> Failed to get properties: Unit name ceph-osd@.service is not valid.
>
>>>
>
>>>
>
>>>
>
>>>
>
>>>
>
>>>
>
>>>
>
>>> ___
>
>>> ceph-users mailing list
>
>>> ceph-users@lists.ceph.com
>
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=CwIFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=ha3XvQGcc5Yztz98b7hb8pYQo14dcIiYxfOoMzyUM00&s=VdVOtGV4JQUKyQDDC_QYn1-7wBcSh-eYwx_cCSQWlQk&e=
>>>  
>
>>>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=CwIGaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=54N4L4csPxJYvC5YZlDB9mMwEKANhFwo2m6R0HMUGZ0&s=LER873rXoF5--GPvmOzkJaQDhPpSvRptxAZ3QP-mlBM&e=
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Scrub Errors

2016-05-04 Thread Blade Doyle
When I issue the "ceph pg repair 1.32" command I *do* see it reported in
the "ceph -w" output but I *do not* see any new messages about page 1.32 in
the log of osd.6 - even if I turn debug messages way up.

# ceph pg repair 1.32
instructing pg 1.32 on osd.6 to repair

(ceph -w shows)
2016-05-04 11:19:50.528355 mon.0 [INF] from='client.?
192.168.2.224:0/1341169978' entity='client.admin' cmd=[{"prefix": "pg
repair", "pgid": "1.32"}]: dispatch

---

Yes, I also noticed that there is only one copy of that pg.  I have no idea
how it happened, but my pools (all of them) got set to replication size=1.
I re-set them back to the intended values as soon as I noticed it.
Currently the pools are configured like this:

# ceph osd pool ls detail
pool 0 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 349499 flags hashpspool
stripe_width 0
removed_snaps [1~d]
pool 1 'cephfs_data' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 300 pgp_num 300 last_change 349490 lfor 25902
flags hashpspool crash_replay_interval 45 tiers 4 read_tier 4 write_tier 4
stripe_width 0
pool 2 'cephfs_metadata' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 300 pgp_num 300 last_change 349503 flags
hashpspool stripe_width 0
pool 4 'ssd_cache' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 349490 flags
hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
126701535232 target_objects 100 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x2
min_read_recency_for_promote 1 stripe_width 0

# ceph osd tree
ID  WEIGHT  TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
-12 0.3 root ssd_cache
 -4 0.2 host node11
  8 0.2 osd.8up  1.0  1.0
-11 0.2 host node13
  0 0.2 osd.0up  1.0  1.0
 -1 2.7 root default
 -7 0.2 host node6
  7 0.2 osd.7up  0.72400  1.0
 -8 0.23000 host node5
  5 0.23000 osd.5up  0.67996  1.0
 -6 0.45999 host node12
  9 0.45999 osd.9up  0.72157  1.0
-10 0.67000 host node14
 10 0.67000 osd.10   up  0.70659  1.0
-13 0.67000 host node22
  6 0.67000 osd.6up  0.69070  1.0
-15 0.67000 host node21
 11 0.67000 osd.11   up  0.69788  1.0

--

For the most part data in my ceph cluster is not critical.  Also, I have a
recent backup.  At this point I would be happy to resolve the pg problems
"any way possible" in order to get it working again.  Can I just delete the
problematic pg (or the versions of it that are broken)?

I tried some commands to "accept the missing objects as lost" but it tells
me:

# ceph pg 1.32 mark_unfound_lost delete
pg has no unfound objects

The osd log for that is:
2016-05-04 11:31:03.742453 9b088350  0 osd.6 350327 do_command r=0
2016-05-04 11:31:03.763017 9b088350  0 osd.6 350327 do_command r=0 pg has
no unfound objects
2016-05-04 11:31:03.763066 9b088350  0 log_channel(cluster) log [INF] : pg
has no unfound objects


I also tried to "force create" the page:
# ceph pg force_create_pg 1.32
pg 1.32 now creating, ok

In that case, I do see a dispatch:
2016-05-04 11:32:42.073625 mon.4 [INF] from='client.?
192.168.2.224:0/208882728' entity='client.admin' cmd=[{"prefix": "pg
force_create_pg", "pgid": "1.32"}]: dispatch
2016-05-04 11:32:42.075024 mon.0 [INF] from='client.17514719 :/0'
entity='client.admin' cmd=[{"prefix": "pg force_create_pg", "pgid":
"1.32"}]: dispatch
2016-05-04 11:32:42.183389 mon.0 [INF] from='client.17514719 :/0'
entity='client.admin' cmd='[{"prefix": "pg force_create_pg", "pgid":
"1.32"}]': finished

That puts the page in a new state for a while:
# ceph health detail | grep 1.32
pg 1.32 is stuck inactive since forever, current state creating, last
acting []
pg 1.32 is stuck unclean since forever, current state creating, last acting
[]

But after a few minutes it returns to the previous state:

# ceph health detail | grep 1.32
pg 1.32 is stuck inactive for 160741.831891, current state
undersized+degraded+peered, last acting [6]
pg 1.32 is stuck unclean for 1093042.263678, current state
undersized+degraded+peered, last acting [6]
pg 1.32 is stuck undersized for 57229.481051, current state
undersized+degraded+peered, last acting [6]
pg 1.32 is stuck degraded for 57229.481382, current state
undersized+degraded+peered, last acting [6]
pg 1.32 is undersized+degraded+peered, acting [6]

Blade.


On Tue, May 3, 2016 at 10:45 AM, Oliver Dzombic 
wrote:

> Hi Blade,
>
> if you dont see anything in the logs, then you should raise the debug
> level/frequency.
>
> You must at least see, that the repair command has been issued  ( started
> ).
>
> Also i am wondering about the [6] from your output.
>
> That means, that there is 

[ceph-users] OSD - Slow Requests

2016-05-04 Thread Garg, Pankaj
Hi,

I am getting messages like the following from my Ceph systems. Normally this 
would indicate issues with Drives. But when I restart my system, different and 
randomly a couple OSDs again start spitting out the same message.
SO definitely it's not the same drives every time.

Any ideas on how to debug this. I don't see any drive related issues in dmesg 
log either.

Thanks
Pankaj



2016-05-04 14:02:52.499115 osd.72 [WRN] slow request 30.429347 seconds old, 
received at 2016-05-04 14:02:22.069658: osd_op(client.2859198.0:9559 
benchmark_data_x86Ceph3_54385_object9558 [write 0~131072] 309.17ee1e0e 
ack+ondisk+write+known_if_redirected e14815) currently waiting for subops from 
84,104
2016-05-04 14:02:54.499453 osd.72 [WRN] 24 slow requests, 1 included below; 
oldest blocked for > 52.866778 secs
2016-05-04 14:02:54.499467 osd.72 [WRN] slow request 30.660900 seconds old, 
received at 2016-05-04 14:02:23.838455: osd_op(client.2859198.0:9661 
benchmark_data_x86Ceph3_54385_object9660 [write 0~131072] 309.4054960e 
ack+ondisk+write+known_if_redirected e14815) currently waiting for subops from 
84,104
2016-05-04 14:02:56.499822 osd.72 [WRN] 25 slow requests, 1 included below; 
oldest blocked for > 54.867154 secs
2016-05-04 14:02:56.499835 osd.72 [WRN] slow request 30.940457 seconds old, 
received at 2016-05-04 14:02:25.559273: osd_op(client.2859197.0:9796 
benchmark_data_x86Ceph1_24943_object9795 [write 0~131072] 308.7e0944a 
ack+ondisk+write+known_if_redirected e14815) currently waiting for subops from 
84,97
2016-05-04 14:02:59.140562 osd.84 [WRN] 33 slow requests, 1 included below; 
oldest blocked for > 58.267177 secs



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Scrub Errors

2016-05-04 Thread Oliver Dzombic
Hi Blade,

you can try to set the min_size to 1, to get it back online, and if/when
the error vanish ( maybe after another repair command ) you can set the
min_size again to 2.

you can try to simply out/down/?remove? the osd where it is on.


-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 04.05.2016 um 22:46 schrieb Blade Doyle:
> 
> When I issue the "ceph pg repair 1.32" command I *do* see it reported in
> the "ceph -w" output but I *do not* see any new messages about page 1.32
> in the log of osd.6 - even if I turn debug messages way up. 
> 
> # ceph pg repair 1.32
> instructing pg 1.32 on osd.6 to repair
> 
> (ceph -w shows)
> 2016-05-04 11:19:50.528355 mon.0 [INF] from='client.?
> 192.168.2.224:0/1341169978 '
> entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "1.32"}]:
> dispatch
> 
> ---
> 
> Yes, I also noticed that there is only one copy of that pg.  I have no
> idea how it happened, but my pools (all of them) got set to replication
> size=1.  I re-set them back to the intended values as soon as I noticed
> it.  Currently the pools are configured like this:
> 
> # ceph osd pool ls detail
> pool 0 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 349499 flags hashpspool
> stripe_width 0
> removed_snaps [1~d]
> pool 1 'cephfs_data' replicated size 2 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 300 pgp_num 300 last_change 349490 lfor
> 25902 flags hashpspool crash_replay_interval 45 tiers 4 read_tier 4
> write_tier 4 stripe_width 0
> pool 2 'cephfs_metadata' replicated size 2 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 300 pgp_num 300 last_change 349503 flags
> hashpspool stripe_width 0
> pool 4 'ssd_cache' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 349490 flags
> hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
> 126701535232 target_objects 100 hit_set
> bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s
> x2 min_read_recency_for_promote 1 stripe_width 0
> 
> # ceph osd tree
> ID  WEIGHT  TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -12 0.3 root ssd_cache
>  -4 0.2 host node11
>   8 0.2 osd.8up  1.0  1.0
> -11 0.2 host node13
>   0 0.2 osd.0up  1.0  1.0
>  -1 2.7 root default
>  -7 0.2 host node6
>   7 0.2 osd.7up  0.72400  1.0
>  -8 0.23000 host node5
>   5 0.23000 osd.5up  0.67996  1.0
>  -6 0.45999 host node12
>   9 0.45999 osd.9up  0.72157  1.0
> -10 0.67000 host node14
>  10 0.67000 osd.10   up  0.70659  1.0
> -13 0.67000 host node22
>   6 0.67000 osd.6up  0.69070  1.0
> -15 0.67000 host node21
>  11 0.67000 osd.11   up  0.69788  1.0
> 
> --
> 
> For the most part data in my ceph cluster is not critical.  Also, I have
> a recent backup.  At this point I would be happy to resolve the pg
> problems "any way possible" in order to get it working again.  Can I
> just delete the problematic pg (or the versions of it that are broken)?
> 
> I tried some commands to "accept the missing objects as lost" but it
> tells me:
> 
> # ceph pg 1.32 mark_unfound_lost delete
> pg has no unfound objects
> 
> The osd log for that is:
> 2016-05-04 11:31:03.742453 9b088350  0 osd.6 350327 do_command r=0
> 2016-05-04 11:31:03.763017 9b088350  0 osd.6 350327 do_command r=0 pg
> has no unfound objects
> 2016-05-04 11:31:03.763066 9b088350  0 log_channel(cluster) log [INF] :
> pg has no unfound objects
> 
> 
> I also tried to "force create" the page:
> # ceph pg force_create_pg 1.32
> pg 1.32 now creating, ok
> 
> In that case, I do see a dispatch:
> 2016-05-04 11:32:42.073625 mon.4 [INF] from='client.?
> 192.168.2.224:0/208882728 '
> entity='client.admin' cmd=[{"prefix": "pg force_create_pg", "pgid":
> "1.32"}]: dispatch
> 2016-05-04 11:32:42.075024 mon.0 [INF] from='client.17514719 :/0'
> entity='client.admin' cmd=[{"prefix": "pg force_create_pg", "pgid":
> "1.32"}]: dispatch
> 2016-05-04 11:32:42.183389 mon.0 [INF] from='client.17514719 :/0'
> entity='client.admin' cmd='[{"prefix": "pg force_create_pg", "pgid":
> "1.32"}]': finished
> 
> That puts the page in a new state for a while:
> # ceph health detail | grep 1.32
> pg 1.32 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.32 is stuck unclean since forever, current state creating, last
> acting []
> 
> But af

Re: [ceph-users] Status of ceph-docker

2016-05-04 Thread Michael Shuey
I'm preparing to use it in production, and have been contributing
fixes for bugs I find.  It's getting fairly solid, but it does need to
be moved to Jewel before we really scale it out.

--
Mike Shuey


On Wed, May 4, 2016 at 8:50 AM, Daniel Gryniewicz  wrote:
> On 05/03/2016 04:17 PM, Vincenzo Pii wrote:
>>
>> https://github.com/ceph/ceph-docker
>>
>> Is someone using ceph-docker in production or the project is meant more
>> for development and experimentation?
>>
>> Vincenzo Pii| TERALYTICS
>> *DevOps Engineer
>> *
>
>
> I'm not aware of anyone currently using it in production, but it is being
> used as a base for a downstream RHCS containerized release, so there will be
> production containerized Ceph deployed.
>
> Daniel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD - Slow Requests

2016-05-04 Thread Christian Balzer

Hello,

On Wed, 4 May 2016 21:08:02 + Garg, Pankaj wrote:

> Hi,
> 
> I am getting messages like the following from my Ceph systems. Normally
> this would indicate issues with Drives. But when I restart my system,
> different and randomly a couple OSDs again start spitting out the same
> message. SO definitely it's not the same drives every time.
> 
> Any ideas on how to debug this. I don't see any drive related issues in
> dmesg log either.
>

Drives having issues (as in being slow due to errors or firmware bugs)
is a possible reason, but it would be not at the top of my list.

You want to run atop, iostat or the likes and graph actual drive and
various Ceph performance counters to see what is going on and if a
particular drive is slower than the rest or if your whole system is just
reaching the limit of its performance.

Looking at your ceph log output, the first thing that catches the eye is
that all slow objects are for benchmark runs (rados bench), so you seem to
stress testing the cluster and have found its limits...

In addition to that all the slow requests include osd.84, so you might
give that one a closer look. 
But that could of course be a coincidence due to limited log samples.

Christian

> Thanks
> Pankaj
> 
> 
> 
> 2016-05-04 14:02:52.499115 osd.72 [WRN] slow request 30.429347 seconds
> old, received at 2016-05-04 14:02:22.069658:
> osd_op(client.2859198.0:9559 benchmark_data_x86Ceph3_54385_object9558
> [write 0~131072] 309.17ee1e0e ack+ondisk+write+known_if_redirected
> e14815) currently waiting for subops from 84,104 2016-05-04
> 14:02:54.499453 osd.72 [WRN] 24 slow requests, 1 included below; oldest
> blocked for > 52.866778 secs 2016-05-04 14:02:54.499467 osd.72 [WRN]
> slow request 30.660900 seconds old, received at 2016-05-04
> 14:02:23.838455: osd_op(client.2859198.0:9661
> benchmark_data_x86Ceph3_54385_object9660 [write 0~131072] 309.4054960e
> ack+ondisk+write+known_if_redirected e14815) currently waiting for
> subops from 84,104 2016-05-04 14:02:56.499822 osd.72 [WRN] 25 slow
> requests, 1 included below; oldest blocked for > 54.867154 secs
> 2016-05-04 14:02:56.499835 osd.72 [WRN] slow request 30.940457 seconds
> old, received at 2016-05-04 14:02:25.559273:
> osd_op(client.2859197.0:9796 benchmark_data_x86Ceph1_24943_object9795
> [write 0~131072] 308.7e0944a ack+ondisk+write+known_if_redirected
> e14815) currently waiting for subops from 84,97 2016-05-04
> 14:02:59.140562 osd.84 [WRN] 33 slow requests, 1 included below; oldest
> blocked for > 58.267177 secs
> 
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Incorrect crush map

2016-05-04 Thread Ben Hines
Centos 7.2.

.. and i think i just figured it out. One node had directories from former
OSDs in /var/lib/ceph/osd. When restarting other OSDs on this host, ceph
apparently added those to the crush map, too.

[root@sm-cld-mtl-013 osd]# ls -la /var/lib/ceph/osd/
total 128
drwxr-x--- 8 ceph ceph  90 Feb 24 14:44 .
drwxr-x--- 9 ceph ceph 106 Feb 24 14:44 ..
drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-42
drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-43
drwxr-xr-x 1 root root 278 May  4 22:21 ceph-44
drwxr-xr-x 1 root root 278 May  4 22:21 ceph-45
drwxr-xr-x 1 root root 278 May  4 22:25 ceph-67
drwxr-xr-x 1 root root 304 May  4 22:25 ceph-86


(42 and 43 are on a different host.. yet when 'systemctl start ceph.target'
is used, the osd preflight adds them to the crush map anyway:


May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.67 at :/0 osd_data
/var/lib/ceph/osd/ceph-67 /var/lib/ceph/osd/ceph-67/journal
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.45 at :/0 osd_data
/var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: WARNING: will not setuid/gid:
/var/lib/ceph/osd/ceph-42 owned by 0:0 and not requested 167:167
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.529176
7f00cca7c900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-43: (2) No such file or directory#033[0m
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.534657
7fb55c17e900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-42: (2) No such file or directory#033[0m
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service: main process
exited, code=exited, status=1/FAILURE
May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@43.service entered
failed state.
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service failed.
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service: main process
exited, code=exited, status=1/FAILURE
May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@42.service entered
failed state.
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service failed.



-Ben

On Tue, May 3, 2016 at 7:16 PM, Wade Holler  wrote:

> Hi Ben,
>
> What OS+Version ?
>
> Best Regards,
> Wade
>
>
> On Tue, May 3, 2016 at 2:44 PM Ben Hines  wrote:
>
>> My crush map keeps putting some OSDs on the wrong node. Restarting them
>> fixes it temporarily, but they eventually hop back to the other node that
>> they aren't really on.
>>
>> Is there anything that can cause this to look for?
>>
>> Ceph 9.2.1
>>
>> -Ben
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW obj remove cls_xx_remove returned -2

2016-05-04 Thread Ben Hines
Ceph 9.2.1, Centos 7.2

I noticed these errors sometimes when removing objects. It's getting a 'No
such file or directory' on the OSD when deleting things sometimes.  Any
ideas here?  Is this expected?

(i anonymized the full filename, but it's all the same file)

RGW log:

2016-05-04 23:14:32.216324 7f92b7741700  1 -- 10.29.16.57:0/2874775405 <==
osd.11 10.30.1.42:6808/7454 45  osd_op_reply(476 default.42048218.15_
... fb66a4923b2029a6588adb1245fa3fe9 [call] v0'0 uv551321 ondisk = -2 ((2)
No such file or directory)) v6  349+0+0 (2101432025 0 0) 0x7f93b403aca0
con 0x7f946001b3d0
2016-05-04 23:14:32.216587 7f931b7b6700  1 -- 10.29.16.57:0/2874775405 -->
10.30.1.42:6808/7454 -- osd_op(client.45297956.0:477
.dir.default.42048218.15.12 [call rgw.bucket_complete_op] 11.74c941dd
ack+ondisk+write+known_if_redirected e104420) v6 -- ?+0 0x7f95100fcb40 con
0x7f946001b3d0
2016-05-04 23:14:32.216807 7f931b7b6700  2 req 4238:22.224049:s3:DELETE
 fb66a4923b2029a6588adb1245fa3fe9:delete_obj:http status=204
2016-05-04 23:14:32.216826 7f931b7b6700  1 == req done
req=0x7f9510091e50 http_status=204 ==
2016-05-04 23:14:32.216920 7f931b7b6700  1 civetweb: 0x7f9518c0:
10.29.16.57 - - [04/May/2016:23:14:09 -0700] "DELETE
fb66a4923b2029a6588adb1245fa3fe9 HTTP/1.1" 204 0 - Boto/2.38.0 Python/2.7.5
Linux/3.10.0-327.10.1.el7.x86_64




Log on the OSD with debug ms=10:


2016-05-04 23:14:31.716246 7fccbec2a700  0  cls/rgw/cls_rgw.cc:1959:
ERROR: rgw_obj_remove(): cls_cxx_remove returned -2
2016-05-04 23:14:31.716379 7fccbec2a700  1 -- 10.30.1.42:6808/7454 -->
10.29.16.57:0/939886467 -- osd_op_reply(525 default.42048218.15_ ...
fb66a4923b2029a6588adb1245fa3fe9 [call rgw.obj_remove] v0'0 uv551321 ondisk
= -2 ((2) No such file or directory)) v6 -- ?+0 0x7fcd05f0d600 con
0x7fcd01865fa0
2016-05-04 23:14:31.716563 7fcc59cb0700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/939886467 pipe(0x7fcd0e29f000 sd=527 :6808 s=2 pgs=16 cs=1
l=1 c=0x7fcd01865fa0).writer: state = open policy.server=1
2016-05-04 23:14:31.716646 7fcc59cb0700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/939886467 pipe(0x7fcd0e29f000 sd=527 :6808 s=2 pgs=16 cs=1
l=1 c=0x7fcd01865fa0).writer: state = open policy.server=1
2016-05-04 23:14:31.716983 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).reader wants 456 bytes from policy throttler
19523/524288000
2016-05-04 23:14:31.717006 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).reader wants 456 from dispatch throttler 0/104857600
2016-05-04 23:14:31.717029 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).aborted = 0
2016-05-04 23:14:31.717056 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).reader got message 111 0x7fcd120c42c0
osd_op(client.45297946.0:411 .dir.default.42048218.15.15 [call
rgw.bucket_prepare_op] 11.c01f555d ondisk+write+known_if_redirected
e104420) v6
2016-05-04 23:14:31.717077 7fcc76585700  1 -- 10.30.1.42:6808/7454 <==
client.45297946 10.29.16.57:0/3924513385 111 
osd_op(client.45297946.0:411 .dir.default.42048218.15.15 [call
rgw.bucket_prepare_op] 11.c01f555d ondisk+write+known_if_redirected
e104420) v6  213+0+243 (3423964475 0 1018669967) 0x7fcd120c42c0 con
0x7fcced99f860
2016-05-04 23:14:31.717081 7fcc74538700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).writer: state = open policy.server=1
2016-05-04 23:14:31.717100 7fcc74538700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).write_ack 111
--
2016-05-04 23:14:32.202608 7fccb49ff700 10 -- 10.30.1.42:6809/7454
dispatch_throttle_release 83 to dispatch throttler 83/104857600
2016-05-04 23:14:32.203922 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).reader got ack seq 1220 >= 1220 on
0x7fcd053a5e00 osd_repop(client.45297861.0:514 11.5d
11/c01f555d/.dir.default.42048218.15.15/head v 104420'1406810) v1
2016-05-04 23:14:32.204040 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).reader wants 83 from dispatch throttler
0/104857600
2016-05-04 23:14:32.204084 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).aborted = 0
2016-05-04 23:14:32.204103 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).reader got message 1236 0x7fcd05d5f440
osd_repop_reply(client.4529786