Re: [ceph-users] degraded objects after osd add

2016-11-17 Thread Burkhard Linke

Hi,


On 11/17/2016 08:07 AM, Steffen Weißgerber wrote:

Hello,

just for understanding:

When starting to fill osd's with data due to setting the weigth from 0 to the 
normal value
the ceph status displays degraded objects (>0.05%).

I don't understand the reason for this because there's no storage revoekd from 
the cluster,
only added. Therefore only the displayed object displacement makes sense.
If you just added a new OSD, a number of PGs will be backfilling or 
waiting for backfilling (the remapped ones). I/O to these PGs is not 
blocked, and thus object may be modified. AFAIK these objects show up as 
degraded.


I'm not sure how ceph handles these objects, e.g. whether it writes them 
to the old OSDs assigned to the PG, or whether they are put on the new 
OSD already, even if the corresponding PG is waiting for backfilling.


Nonetheless the degraded objects will be cleaned up during backfilling.

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-17 Thread Thomas Danan
Hi,

I have recheck the pattern when slow request are detected.

I have an example with following (primary: 411, secondary: 176, 594)
On primary slow requests detected: waiting for subops (176, 594)  during 16 
minutes 

2016-11-17 13:29:27.209754 7f001d414700 0 log_channel(cluster) log [WRN] : 7 
slow requests, 7 included below; oldest blocked for > 480.477315 secs
2016-11-17 13:29:27.209777 7f001d414700 0 log_channel(cluster) log [WRN] : slow 
request 480.477315 seconds old, received at 2016-11-17 13:21:26.732303: 
osd_op(client.2407558.1:206455044 rbd_data.66ea12ae8944a.001acbbc 
[set-alloc-hint object_size 4194304 write_size 4194304,write 1257472~368640] 
0.61fe279f snapc 3fd=[3fd,3de] ondisk+write e210553) currently waiting for 
subops from 176,594

So the primary OSD is waiting for subops since 13:21 (13:29 - 480 seconds)

2016-11-17 13:36:33.039691 7efffd8ee700 0 -- 192.168.228.23:6800/694486 >> 
192.168.228.7:6819/3611836 pipe(0x13ffd000 sd=33 :17791 s=2 pgs=131 cs=7 l=0 
c=0x13251de0).fault, initiating reconnect
2016-11-17 13:36:39.570692 7efff6784700 0 -- 192.168.228.23:6800/694486 >> 
192.168.228.13:6858/2033854 pipe(0x17009000 sd=60 :52188 s=2 pgs=147 cs=7 l=0 
c=0x141159c0).fault, initiating reconnect

After this log, it seems the ops are unblocked and I do not have any more the 
“currently waiting for subops from 176,594”

So primary OSD was blocked during 15 minutes in total 


On the secondary OSD, I can see the following messages during the same period 
(but also after and before)

secondary:
2016-11-17 13:34:58.125076 7fbcc7517700 0 -- 192.168.228.7:6819/3611836 >> 
192.168.228.42:6832/2873873 pipe(0x12d2e000 sd=127 :6819 s=2 pgs=86 cs=5 l=0 
c=0xef18c00).fault with nothing to send, going to sandby

In some other example and with some DEBUG messages activated  I was also able 
to see the many of the following messages on secondary OSDs.
2016-11-15 03:53:04.298502 7ff9c434f700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out after 15

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 22:13
To: Thomas Danan; n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

Hi,

Yes, I can’t think of anything else at this stage. Could you maybe repost some 
dump historic op dumps  now that you have turned off snapshots. I wonder if 
they might reveal anything.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 16 November 2016 17:38
To: n...@fisk.me.uk; 'Peter Maloney' 
mailto:peter.malo...@brockmann-consult.de>>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

We have deleted all Snapshots and observed the system for several hours.
From what I see this did not help to reduce the blocked ops and IO freeze on 
Client ceph side.

We have also tried to increase a little bit the PGs (by 8 than 128) because 
this is something we should do and we wanted to see how the cluster was 
behaving.
During recovery, the number of blocked ops and associated duration increased 
significantly. Also the number of impacted OSDs was much more important.

Don’t really know what to conclude from all of this …

Again we have checked Disk / network / and everything seems fine …

Thomas
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 14:01
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

The snapshot works by using Copy On Write. If you dirty even a 4kb section of a 
4MB object in the primary RBD, that entire 4MB object then needs to be read and 
then written into the snapshot RBD.

From: Thomas Danan [mailto:thomas.da...@mycom-osi.com]
Sent: 16 November 2016 12:58
To: Thomas Danan 
mailto:thomas.da...@mycom-osi.com>>; 
n...@fisk.me.uk; 'Peter Maloney' 
mailto:peter.malo...@brockmann-consult.de>>
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Actually I was wondering, is there any difference between Snapshot or simple 
RBD image ?
With simple RBD image when doing a random IO, we are asking Ceph cluster to 
update one or several 4MB objects no ?
So Snapshotting is multiplying the load by 2 but not more, Am I wrong ?

Thomas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: mercredi 16 novembre 2016 13:52
To: n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Yes our application is doing small Random IO and I did not real

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-17 Thread Thomas Danan
Actually forgot to say that the following issue is describing very close 
symptoms :

http://tracker.ceph.com/issues/9844

Thomas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: jeudi 17 novembre 2016 09:59
To: n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Hi,

I have recheck the pattern when slow request are detected.

I have an example with following (primary: 411, secondary: 176, 594)
On primary slow requests detected: waiting for subops (176, 594)  during 16 
minutes 

2016-11-17 13:29:27.209754 7f001d414700 0 log_channel(cluster) log [WRN] : 7 
slow requests, 7 included below; oldest blocked for > 480.477315 secs
2016-11-17 13:29:27.209777 7f001d414700 0 log_channel(cluster) log [WRN] : slow 
request 480.477315 seconds old, received at 2016-11-17 13:21:26.732303: 
osd_op(client.2407558.1:206455044 rbd_data.66ea12ae8944a.001acbbc 
[set-alloc-hint object_size 4194304 write_size 4194304,write 1257472~368640] 
0.61fe279f snapc 3fd=[3fd,3de] ondisk+write e210553) currently waiting for 
subops from 176,594

So the primary OSD is waiting for subops since 13:21 (13:29 - 480 seconds)

2016-11-17 13:36:33.039691 7efffd8ee700 0 -- 192.168.228.23:6800/694486 >> 
192.168.228.7:6819/3611836 pipe(0x13ffd000 sd=33 :17791 s=2 pgs=131 cs=7 l=0 
c=0x13251de0).fault, initiating reconnect
2016-11-17 13:36:39.570692 7efff6784700 0 -- 192.168.228.23:6800/694486 >> 
192.168.228.13:6858/2033854 pipe(0x17009000 sd=60 :52188 s=2 pgs=147 cs=7 l=0 
c=0x141159c0).fault, initiating reconnect

After this log, it seems the ops are unblocked and I do not have any more the 
“currently waiting for subops from 176,594”

So primary OSD was blocked during 15 minutes in total 


On the secondary OSD, I can see the following messages during the same period 
(but also after and before)

secondary:
2016-11-17 13:34:58.125076 7fbcc7517700 0 -- 192.168.228.7:6819/3611836 >> 
192.168.228.42:6832/2873873 pipe(0x12d2e000 sd=127 :6819 s=2 pgs=86 cs=5 l=0 
c=0xef18c00).fault with nothing to send, going to sandby

In some other example and with some DEBUG messages activated  I was also able 
to see the many of the following messages on secondary OSDs.
2016-11-15 03:53:04.298502 7ff9c434f700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out after 15

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 22:13
To: Thomas Danan; n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

Hi,

Yes, I can’t think of anything else at this stage. Could you maybe repost some 
dump historic op dumps  now that you have turned off snapshots. I wonder if 
they might reveal anything.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 16 November 2016 17:38
To: n...@fisk.me.uk; 'Peter Maloney' 
mailto:peter.malo...@brockmann-consult.de>>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

We have deleted all Snapshots and observed the system for several hours.
From what I see this did not help to reduce the blocked ops and IO freeze on 
Client ceph side.

We have also tried to increase a little bit the PGs (by 8 than 128) because 
this is something we should do and we wanted to see how the cluster was 
behaving.
During recovery, the number of blocked ops and associated duration increased 
significantly. Also the number of impacted OSDs was much more important.

Don’t really know what to conclude from all of this …

Again we have checked Disk / network / and everything seems fine …

Thomas
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 14:01
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

The snapshot works by using Copy On Write. If you dirty even a 4kb section of a 
4MB object in the primary RBD, that entire 4MB object then needs to be read and 
then written into the snapshot RBD.

From: Thomas Danan [mailto:thomas.da...@mycom-osi.com]
Sent: 16 November 2016 12:58
To: Thomas Danan 
mailto:thomas.da...@mycom-osi.com>>; 
n...@fisk.me.uk; 'Peter Maloney' 
mailto:peter.malo...@brockmann-consult.de>>
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Actually I was wondering, is there any difference between Snapshot or simple 
RBD image ?
With simple RBD image when doing a random IO, we are asking Ceph cluster to 
update one or several 4MB objects no ?
So Snapshotting is multipl

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-17 Thread Nick Fisk
Hi Thomas,

 

Do you have the OSD logs from around the time of that slow request (13:12 to 
13:29 period)?

 

Do you also see anything about OSD’s going down in the Mon ceph.log file around 
that time?

 

480 seconds is probably far too long for a disk to be busy for, I’m wondering 
if the OSD is either dying and respawning or if you are running out of some 
type of system resource….eg TCP connections or something like that, which means 
the OSD’s can’t communicate with each other.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 17 November 2016 08:59
To: n...@fisk.me.uk; 'Peter Maloney' 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

 

Hi,

 

I have recheck the pattern when slow request are detected.

 

I have an example with following (primary: 411, secondary: 176, 594)

On primary slow requests detected: waiting for subops (176, 594)  during 16 
minutes 

 

2016-11-17 13:29:27.209754 7f001d414700 0 log_channel(cluster) log [WRN] : 7 
slow requests, 7 included below; oldest blocked for > 480.477315 secs

2016-11-17 13:29:27.209777 7f001d414700 0 log_channel(cluster) log [WRN] : slow 
request 480.477315 seconds old, received at 2016-11-17 13:21:26.732303: 
osd_op(client.2407558.1:206455044 rbd_data.66ea12ae8944a.001acbbc 
[set-alloc-hint object_size 4194304 write_size 4194304,write 1257472~368640] 
0.61fe279f snapc 3fd=[3fd,3de] ondisk+write e210553) currently waiting for 
subops from 176,594

 

So the primary OSD is waiting for subops since 13:21 (13:29 - 480 seconds)

 

2016-11-17 13:36:33.039691 7efffd8ee700 0 -- 192.168.228.23:6800/694486 >> 
192.168.228.7:6819/3611836 pipe(0x13ffd000 sd=33 :17791 s=2 pgs=131 cs=7 l=0 
c=0x13251de0).fault, initiating reconnect

2016-11-17 13:36:39.570692 7efff6784700 0 -- 192.168.228.23:6800/694486 >> 
192.168.228.13:6858/2033854 pipe(0x17009000 sd=60 :52188 s=2 pgs=147 cs=7 l=0 
c=0x141159c0).fault, initiating reconnect

 

After this log, it seems the ops are unblocked and I do not have any more the 
“currently waiting for subops from 176,594”

 

So primary OSD was blocked during 15 minutes in total 

 

 

On the secondary OSD, I can see the following messages during the same period 
(but also after and before)

 

secondary:

2016-11-17 13:34:58.125076 7fbcc7517700 0 -- 192.168.228.7:6819/3611836 >> 
192.168.228.42:6832/2873873 pipe(0x12d2e000 sd=127 :6819 s=2 pgs=86 cs=5 l=0 
c=0xef18c00).fault with nothing to send, going to sandby

 

In some other example and with some DEBUG messages activated  I was also able 
to see the many of the following messages on secondary OSDs.

2016-11-15 03:53:04.298502 7ff9c434f700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out after 15

 

Thomas

 

From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: mercredi 16 novembre 2016 22:13
To: Thomas Danan; n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

 

Hi,

 

Yes, I can’t think of anything else at this stage. Could you maybe repost some 
dump historic op dumps  now that you have turned off snapshots. I wonder if 
they might reveal anything.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 16 November 2016 17:38
To: n...@fisk.me.uk  ; 'Peter Maloney' 
mailto:peter.malo...@brockmann-consult.de> 
>
Cc: ceph-users@lists.ceph.com  
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

 

Hi Nick,

 

We have deleted all Snapshots and observed the system for several hours.

>From what I see this did not help to reduce the blocked ops and IO freeze on 
>Client ceph side.

 

We have also tried to increase a little bit the PGs (by 8 than 128) because 
this is something we should do and we wanted to see how the cluster was 
behaving.

During recovery, the number of blocked ops and associated duration increased 
significantly. Also the number of impacted OSDs was much more important. 

 

Don’t really know what to conclude from all of this …

 

Again we have checked Disk / network / and everything seems fine …

 

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: mercredi 16 novembre 2016 14:01
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com  
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

 

The snapshot works by using Copy On Write. If you dirty even a 4kb section of a 
4MB object in the primary RBD, that entire 4MB object then needs to be read and 
then written into the snapshot RBD.

 

From: Thomas Danan [mailto:thomas.da...@mycom-osi.com] 
Sent: 16 November 2016 12:58
To: Thomas Danan mailto:thomas.da...@mycom-osi.com> >; n...@fisk.me.uk  
; 'Peter Maloney' mailto:peter.malo...@brockmann-c

Re: [ceph-users] how to list deleted objects in snapshot

2016-11-17 Thread Jan Krcmar
hi,

it seems, it could be the SnapContext problem.

i've tried stat command. it works fine.

i will i post the bug report?

thanks
fous

2016-11-16 21:55 GMT+01:00 Gregory Farnum :
> On Wed, Nov 16, 2016 at 5:13 AM, Jan Krcmar  wrote:
>> hi,
>>
>> i've got found problem/feature in pool snapshots
>>
>> when i delete some object from pool which was previously snapshotted,
>> i cannot list the object name in the snapshot anymore.
>>
>> steps to reproduce
>>
>> # ceph -v
>> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> # rados -p test ls
>> stats
>> # rados -p test mksnap now
>> # rados -p test -s now ls
>> selected snap 3 'now'
>> stats
>> # rados -p test rm stats
>> # rados -p test -s now ls
>> selected snap 3 'now'
>> # rados -p test -s now stat stats
>> selected snap 3 'now'
>> test/stats mtime 2016-11-16 14:07:14.00, size 329
>> # rados -p test stat stats
>>  error stat-ing test/stats: (2) No such file or directory
>>
>> is this rados feature or bug?
>
> The rados tool does not apply the pool snapshot "SnapContext" when
> doing listings. I *think* if it did, you would get the listing you
> desire, but I'm not certain and it might be much more complicated. (If
> it's just about using the correct SnapContext, it would be a pretty
> small patch!)
> It does apply the correct SnapContext on many other operations; you
> did you try specifying "-s now" when doing the stat command?
> -Greg
>
>>
>> thanks
>> jan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Volume Issue

2016-11-17 Thread Alexey Sheplyakov
Hi,

please share some details about your cluster (especially the hardware)

- how many OSDs are there? How many disks per an OSD machine?
- Do you use dedicated (SSD) OSD journals?
- RAM size, CPUs model, network card bandwidth/model
- Do you have a dedicated cluster network?
- How many VMs (in the whole cluster) are running?
- The total number (in the whole cluster) of attached rbd images
- The total number (in the whole cluster) of concurrent writers

Also,
- does the same problem occur when multiple threads/processes are writing
into a single rbd image?
- are there anything "interesting" in qemu, kernel (both on hypervisors and
OSDs), OSDs logs?

Best regards,
  Alexey


On Wed, Nov 16, 2016 at 9:10 AM,  wrote:

> Hi All,
>
>
>
> We have a Ceph Storage Cluster and it’s been integrated with our Openstack
> private cloud.
>
> We have created a Pool for Volume which allows our Openstack Private Cloud
> user to create a volume from image and boot from volume.
>
> Additionally our images(both Ubuntu1404 and CentOS 7) are in a raw format.
>
>
>
> One of our use cases is to attach multiple volumes other than “boot
> volume”.
>
> We have observed that when we attach multiple volumes, and try to
> simultaneous writes to these attached volumes for example via the “dd
> command” , all these processes go into “D state (uninterruptible sleep)”.
>
> Also we can see in vmstat output that “bo” values trickling down to zero.
>
> We have checked the network utilization on the compute node which does not
> show any issues.
>
>
>
> Finally after a while system becomes unresponsive and only way to resolve
> is to reboot the VM.
>
>
>
> Some of our version details are as follows.
>
>
>
> Ceph version : 0.80.7
>
> Libvirt version : 1.2.2
>
> Openstack Version : Juno (Mirantis 6.0)
>
>
>
> Please do let me know if anyone has faced a similar issue or have any
> pointers.
>
>
>
> Any direction will be helpful.
>
>
>
> Thanks,
>
> Mehul
>
>
>
>
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-11-17 Thread Nick Fisk
Hi Sam,

I've updated the ticket with logs from the wip run.

Nick

> -Original Message-
> From: Samuel Just [mailto:sj...@redhat.com]
> Sent: 15 November 2016 18:30
> To: Nick Fisk 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == 
> i->first)
> 
> http://tracker.ceph.com/issues/17916
> 
> I just pushed a branch wip-17916-jewel based on v10.2.3 with some additional 
> debugging.  Once it builds, would you be able to start
> the afflicted osds with that version of ceph-osd and
> 
> debug osd = 20
> debug ms = 1
> debug filestore = 20
> 
> and get me the log?
> -Sam
> 
> On Tue, Nov 15, 2016 at 2:06 AM, Nick Fisk  wrote:
> > Hi,
> >
> > I have two OSD's which are failing with an assert which looks related
> > to missing objects. This happened after a large RBD snapshot was
> > deleted causing several OSD's to start flapping as they experienced
> > high load. Cluster is fully recovered and I don't need any help from a 
> > recovery perspective. I'm happy to Zap and recreate OSD's,
> which I will probably do in a couple of days time. Or if anybody looks at the 
> error and see's an easy way to get the OSD to start up, then
> bonus!!!
> >
> > However, I thought I would post in case there is any interest in
> > trying to diagnose why this happened. There was no power or networking 
> > issues and no hard reboot's, so this is purely contained
> within the Ceph OSD process.
> >
> > The objects that it claims are missing are from the RBD that had the
> > snapshot deleted. I'm guessing that the last command before the OSD
> > died at some point was to delete those two objects which did actually 
> > happen, but for some reason the OSD had died before it got
> confirmation??? And now it's trying to delete them, but they don't exist.
> >
> > I have the full debug 20 log, but pretty much all the lines above the
> > below snippet just have it deleting thousands of objects without any 
> > problems.
> >
> > Nick
> >
> >  -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
> > -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for 
> > missing items over interval (0'0,1607344'260104]
> > -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log  missing
> > 1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.002555c5:head
> > -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log  missing
> > 1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.002555c5:6c
> >  0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In
> > function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
> > ghobject_t, const pg_info_t&, std::map&,
> > PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, const
> > DoutPrefixProvider*, std::set >*)'
> > thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
> > osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
> >
> >  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x80) [0x5642d2734ea0]
> >  2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
> > pg_info_t const&, std::map > std::less, std::allocator > hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
> > std::__cxx11::basic_ostringstream,
> > std::allocator >&, DoutPrefixProvider const*,
> > std::set,
> > std::allocator >, std::less > std::char_traits, std::allocator > >,
> > std::allocator > std::char_traits, std::allocator > > >*)+0x719)
> > [0x5642d22e2fd9]
> >  3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6)
> > [0x5642d21172d6]
> >  4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
> >  5: (OSD::init()+0x2026) [0x5642d205e7a6]
> >  6: (main()+0x2ea5) [0x5642d1fd08f5]
> >  7: (__libc_start_main()+0xf0) [0x7f728c77c830]
> >  8: (_start()+0x29) [0x5642d2011f89]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is needed 
> > to interpret this.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Register ceph daemons on initctl

2016-11-17 Thread Jaemyoun Lee
Dear all,

I have a trouble for using Ceph.
When I built Ceph by source code from the official repo with v10.2.3, I
couldn't create a Ceph cluster.

The stat of OSDs cannot be UP after they were activated.
I think the problem is Upstart because "make install" didn't copy conf
files to /etc/init/

# ls /etc/int/ | grep ceph*
#

To solve it, I copied files in "src/upstart/" to /etc/init, and also copied
some script to /etc/init.d/. However, I failed.

May you tell me how to add ceph daemons to initctl?

Best regards,
Jae


---
위 전자우편에 포함된 정보는 지정된 수신인에게만 발송되는 것으로 보안을 유지해야 하는 정보와 법률상 및 기타 사유로 공개가 금지된 정보가 
포함돼 있을 수 있습니다. 
귀하가 이 전자우편의 지정 수신인이 아니라면 본 메일에 포함된 정보의 전부 또는 일부를 무단으로 보유, 사용하거나 제3자에게 공개, 복사, 
전송, 배포해서는 안 됩니다. 
본 메일이 잘못 전송되었다면, 전자우편 혹은 전화로 연락해주시고, 메일을 즉시 삭제해 주시기 바랍니다. 협조해 주셔서 감사합니다.

This e-mail is intended only for the named recipient. 
Dissemination, distribution, forwarding, or copying of this e-mail by anyone 
other than the intended recipient is prohibited. 
If you have received it in error, please notify the sender by e-mail and 
completely delete it. Thank you for your cooperation.
---___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-11-17 Thread Samuel Just
Puzzling, added a question to the ticket.
-Sam

On Thu, Nov 17, 2016 at 4:32 AM, Nick Fisk  wrote:
> Hi Sam,
>
> I've updated the ticket with logs from the wip run.
>
> Nick
>
>> -Original Message-
>> From: Samuel Just [mailto:sj...@redhat.com]
>> Sent: 15 November 2016 18:30
>> To: Nick Fisk 
>> Cc: Ceph Users 
>> Subject: Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == 
>> i->first)
>>
>> http://tracker.ceph.com/issues/17916
>>
>> I just pushed a branch wip-17916-jewel based on v10.2.3 with some additional 
>> debugging.  Once it builds, would you be able to start
>> the afflicted osds with that version of ceph-osd and
>>
>> debug osd = 20
>> debug ms = 1
>> debug filestore = 20
>>
>> and get me the log?
>> -Sam
>>
>> On Tue, Nov 15, 2016 at 2:06 AM, Nick Fisk  wrote:
>> > Hi,
>> >
>> > I have two OSD's which are failing with an assert which looks related
>> > to missing objects. This happened after a large RBD snapshot was
>> > deleted causing several OSD's to start flapping as they experienced
>> > high load. Cluster is fully recovered and I don't need any help from a 
>> > recovery perspective. I'm happy to Zap and recreate OSD's,
>> which I will probably do in a couple of days time. Or if anybody looks at 
>> the error and see's an easy way to get the OSD to start up, then
>> bonus!!!
>> >
>> > However, I thought I would post in case there is any interest in
>> > trying to diagnose why this happened. There was no power or networking 
>> > issues and no hard reboot's, so this is purely contained
>> within the Ceph OSD process.
>> >
>> > The objects that it claims are missing are from the RBD that had the
>> > snapshot deleted. I'm guessing that the last command before the OSD
>> > died at some point was to delete those two objects which did actually 
>> > happen, but for some reason the OSD had died before it got
>> confirmation??? And now it's trying to delete them, but they don't exist.
>> >
>> > I have the full debug 20 log, but pretty much all the lines above the
>> > below snippet just have it deleting thousands of objects without any 
>> > problems.
>> >
>> > Nick
>> >
>> >  -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
>> > -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for 
>> > missing items over interval (0'0,1607344'260104]
>> > -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log  missing
>> > 1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.002555c5:head
>> > -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log  missing
>> > 1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.002555c5:6c
>> >  0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In
>> > function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
>> > ghobject_t, const pg_info_t&, std::map&,
>> > PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, const
>> > DoutPrefixProvider*, std::set >*)'
>> > thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
>> > osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
>> >
>> >  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x80) [0x5642d2734ea0]
>> >  2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
>> > pg_info_t const&, std::map> > std::less, std::allocator> > hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
>> > std::__cxx11::basic_ostringstream,
>> > std::allocator >&, DoutPrefixProvider const*,
>> > std::set,
>> > std::allocator >, std::less> > std::char_traits, std::allocator > >,
>> > std::allocator> > std::char_traits, std::allocator > > >*)+0x719)
>> > [0x5642d22e2fd9]
>> >  3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6)
>> > [0x5642d21172d6]
>> >  4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
>> >  5: (OSD::init()+0x2026) [0x5642d205e7a6]
>> >  6: (main()+0x2ea5) [0x5642d1fd08f5]
>> >  7: (__libc_start_main()+0xf0) [0x7f728c77c830]
>> >  8: (_start()+0x29) [0x5642d2011f89]
>> >  NOTE: a copy of the executable, or `objdump -rdS ` is needed 
>> > to interpret this.
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Crush Adjustment

2016-11-17 Thread Pasha

Hi guys,

Fairly simple questions for you I'm sure, but never had to do it myself 
so thought I'd get your input.


I am running a 5 node cluster with regular spinners and ssd journals at 
the moment. Recently I threw in 1TB SSD per node and wanted to create a 
pool that is purely the new SSDs. I found a few guides on how to do this 
but none of them mention whether the required changes to the crush map 
would affect my existing pool/data. If there is a "right" way of doing 
this I would much appreciate it. This is a production cluster hence I 
can not take any chances that my existing data would be compromised.


Thanks very much!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] index-sharding on existing bucket ?

2016-11-17 Thread Yoann Moulin
Hello,

is that possible to shard the index of existing buckets ?

I have more than 100TB of data in a couples of buckets, I'd like to avoid to re 
upload everythings.

Thanks for your help,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush Adjustment

2016-11-17 Thread David Turner
Adding the pool of ssd's and changing their weights to balance them will not 
affect your pool of spinning disks.  The PGs and OSD weights are isolated 
inside by being in different pools under different roots.



[cid:image85b6bb.JPG@a2ecd567.4085178b]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Pasha 
[pa...@prosperity4ever.com]
Sent: Thursday, November 17, 2016 1:01 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Crush Adjustment

Hi guys,

Fairly simple questions for you I'm sure, but never had to do it myself
so thought I'd get your input.

I am running a 5 node cluster with regular spinners and ssd journals at
the moment. Recently I threw in 1TB SSD per node and wanted to create a
pool that is purely the new SSDs. I found a few guides on how to do this
but none of them mention whether the required changes to the crush map
would affect my existing pool/data. If there is a "right" way of doing
this I would much appreciate it. This is a production cluster hence I
can not take any chances that my existing data would be compromised.

Thanks very much!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Volume Issue

2016-11-17 Thread Mehul1.Jani
Thanks everyone for your inputs.

Below is a small writeup which I wanted to share with everyone in Ceph User 
community.

Summary of the Ceph Issue with Volumes

Our Setup
As mentioned earlier in our setup we have Openstack MOS 6.0 integrated with 
Ceph Storage cluster.
The version details are as follows
Ceph version : 0.80.7
Libvirt version : 1.2.2
Openstack Version : Juno (Mirantis 6.0)


Statement of Problem
When we attached multiple volumes( greater than 6) to  VM instance ,similar to 
adding multiple disks on a Hadoop Baremetal. And tried to write to multiple 
disks simultaneously for example via dd command "dd if=/dev/zero 
of=/disk{1..6}/test bs=4K count=10485760".
Observing the "vmstat 1" on the VM instance, we saw that over a period of time 
the "bo (block out)" value started to trickle down to zero.
As soon as the "bo value" reached zero, the load on the VM instance spiked and 
system became unresponsive. We had to reboot the VM instance to recover.

Also when we checked all the "dd" processes were in "D uninterruptible sleep 
state".

Our investigate and Probable Resolution
It was from the /var/log/syslog on the compute nodes on which the VM instance 
was running where we found an error message "Too many open files".
Example below
ABCD = PID of qemu instance

<8>Nov 18 04:56:49 node-XXX qemu-system-x86_64: 2016-11-18 04:56:49.939702 
7fe9b569d700 -1 -- :0/70 >> :6830/14356 
pipe(0x7fede65dcbf0 sd=
-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7fede37fc8c0).connect couldn't created socket (24) 
Too many open files

When we checked the limit for number of open files from proc we found below
XX@node-:~# cat /proc//limits
...
Max open files1024 4096 files
...

On this basis ,we increased the open file decriptor limit for libvirt-bin 
process from 1024 to 65536.
We had to put the below limit commands in /etc/default/libvirt-bin
ulimit -Hn 65536
ulimit -Sn 65536

We had to reboot the qemu instances via nova stop and nova start for the new 
limits to take effect.

This workaround has solved our issue for now and the above mentioned test cases 
are now successful.

We also checked different points below which were indeed helpful in narrowing 
the issue

* Was the issue limited to a specific type of Linux OS (Ubuntu or 
CentOS)

* Was the issue limited to specific kernel. We upgraded the kernel but 
still the issue persisted.

* Was the issue due to any limiting resources (CPU , RAM, NETWORK , 
DISK IO) on either the VM instance or Compute Node.

* We also tried to tune kernel parameters such as dirty_ratio , 
background_dirty_ratio etc. But no improvement was observed.

* Also we observed that issue was NOT the number of volumes attached 
but the total amount of IO performed.

As per our understanding this is a good resolution for now but it may need 
monitoring and appropriate tuning.

Please do let me know if there are any questions/concerns or pointers :)

Thanks once again.

Thanks,
Mehul


From: Mehul1 Jani
Sent: 16 November 2016 11:40
To: 'ceph-users@lists.ceph.com'
Cc: Sanjeev Jaiswal; Harshit T Shah; Hardikv Desai
Subject: Ceph Volume Issue

Hi All,

We have a Ceph Storage Cluster and it's been integrated with our Openstack 
private cloud.
We have created a Pool for Volume which allows our Openstack Private Cloud user 
to create a volume from image and boot from volume.
Additionally our images(both Ubuntu1404 and CentOS 7) are in a raw format.

One of our use cases is to attach multiple volumes other than "boot volume".
We have observed that when we attach multiple volumes, and try to simultaneous 
writes to these attached volumes for example via the "dd command" , all these 
processes go into "D state (uninterruptible sleep)".
Also we can see in vmstat output that "bo" values trickling down to zero.
We have checked the network utilization on the compute node which does not show 
any issues.

Finally after a while system becomes unresponsive and only way to resolve is to 
reboot the VM.

Some of our version details are as follows.

Ceph version : 0.80.7
Libvirt version : 1.2.2
Openstack Version : Juno (Mirantis 6.0)

Please do let me know if anyone has faced a similar issue or have any pointers.

Any direction will be helpful.

Thanks,
Mehul


"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept respon

Re: [ceph-users] Register ceph daemons on initctl

2016-11-17 Thread 钟佳佳
if you built from git repo tag v10.2.3, 
refers to links below from ceph.com
http://docs.ceph.com/docs/emperor/install/build-packages/
http://docs.ceph.com/docs/jewel/rados/operations/operating/#running-ceph-with-upstart

if you built from ceph-10.2.3.tar.gz, it seems there's no debian stuff for 
dpkg-buildpackage,
you shall ./configure and make yourself.

I think it's right to copy the *.conf for upstart to /etc/init, 
run:
initctl reload-configuration
initctl list | grep ceph
start ceph-osd id=XXX
ps aux | grep ceph-osd for the osd.X

worse:
check the log, or 
just run "ceph-osd --cluster=ceph -i  X -f -d"(for  foreground  and debug) 
to make sure if it has anything  to do with the system init

Good Luck

the footer sucks

 
-- Original --
From:  "Jaemyoun Lee";
Date:  Fri, Nov 18, 2016 01:52 AM
To:  "“ceph-us...@ceph.com”"; 

Subject:  [ceph-users] Register ceph daemons on initctl

 
Dear all,


I have a trouble for using Ceph.

When I built Ceph by source code from the official repo with v10.2.3, I 
couldn't create a Ceph cluster.


The stat of OSDs cannot be UP after they were activated.

I think the problem is Upstart because "make install" didn't copy conf files to 
/etc/init/ 


# ls /etc/int/ | grep ceph*
#


To solve it, I copied files in "src/upstart/" to /etc/init, and also copied 
some script to /etc/init.d/. However, I failed.



May you tell me how to add ceph daemons to initctl?


Best regards,

Jae


 
 




위 전자우편에 포함된 정보는 지정된 수신인에게만 발송되는 것으로 보안을 유지해야 하는 정보와 법률상 및 기타 사유로 공개가 금지된 정보가 
포함돼 있을 수 있습니다. 

귀하가 이 전자우편의 지정 수신인이 아니라면 본 메일에 포함된 정보의 전부 또는 일부를 무단으로 보유, 사용하거나 제3자에게 공개, 복사, 
전송, 배포해서는 안 됩니다. 

본 메일이 잘못 전송되었다면, 전자우편 혹은 전화로 연락해주시고, 메일을 즉시 삭제해 주시기 바랍니다. 협조해 주셔서 감사합니다.




This e-mail is intended only for the named recipient. 

Dissemination, distribution, forwarding, or copying of this e-mail by anyone 
other than the intended recipient is prohibited. 

If you have received it in error, please notify the sender by e-mail and 
completely delete it. Thank you for your cooperation.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com