Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-28 Thread Christian Balzer

Hello,

On Tue, 28 Jun 2016 08:34:26 +0200 Stefan Priebe - Profihost AG wrote:

> Am 27.06.2016 um 02:14 schrieb Christian Balzer:
> > On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
> > 
> >> Hi,
> >>
> >> is there any option or chance to have auto repair of pgs in hammer?
> >>
> > Short answer: 
> > No, in any version of Ceph.
> > 
> > Long answer:
> > There are currently no checksums generated by Ceph and present to
> > facilitate this.
> 
> Yes but if you have a replication count of 3 ceph pg repair was always
> working for me since bobtail. I've never seen corrupted data.
>
That's good and lucky for you.

Not seeing corrupted data also doesn't mean there wasn't any corruption,
it could simply mean that the data in question wasn't used or overwritten
before being read again.

In the handful of scrub errors I ever encountered there was one case where
blindly doing a repair from the primary PG would have been the wrong thing
to do.
 
> > If you'd run BTRFS or ZFS with filestore you'd be closer to an
> > automatic state of affairs, as these filesystems do strong checksums
> > and check them on reads and would create an immediate I/O error if
> > something got corrupted, thus making it clear which OSD is in need of
> > the hammer of healing.
> 
> Yes but at least BTRFS is still not working for ceph due to
> fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
> doubles it's I/O after a few days.
> 
Nobody (well, certainly not me) suggested to use BTRFS, especially with
Bluestore "around the corner".

Just pointing out that it has the necessary checksumming features.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-28 Thread Stefan Priebe - Profihost AG

Am 28.06.2016 um 09:06 schrieb Christian Balzer:
> 
> Hello,
> 
> On Tue, 28 Jun 2016 08:34:26 +0200 Stefan Priebe - Profihost AG wrote:
> 
>> Am 27.06.2016 um 02:14 schrieb Christian Balzer:
>>> On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
>>>
 Hi,

 is there any option or chance to have auto repair of pgs in hammer?

>>> Short answer: 
>>> No, in any version of Ceph.
>>>
>>> Long answer:
>>> There are currently no checksums generated by Ceph and present to
>>> facilitate this.
>>
>> Yes but if you have a replication count of 3 ceph pg repair was always
>> working for me since bobtail. I've never seen corrupted data.
>>
> That's good and lucky for you.
> 
> Not seeing corrupted data also doesn't mean there wasn't any corruption,
> it could simply mean that the data in question wasn't used or overwritten
> before being read again.

Sure that's correct ;-) It just has happened so often that i thought
this could not always be the case. We had a lot of kernel crashes the
last month regarding XFS and bcache.

> In the handful of scrub errors I ever encountered there was one case where
> blindly doing a repair from the primary PG would have been the wrong thing
> to do.

Are your sure it really simply uses the primary pg? i always thought it
compares the sizes of the object and date with a replication factor 3.


>>> If you'd run BTRFS or ZFS with filestore you'd be closer to an
>>> automatic state of affairs, as these filesystems do strong checksums
>>> and check them on reads and would create an immediate I/O error if
>>> something got corrupted, thus making it clear which OSD is in need of
>>> the hammer of healing.
>>
>> Yes but at least BTRFS is still not working for ceph due to
>> fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
>> doubles it's I/O after a few days.
>>
> Nobody (well, certainly not me) suggested to use BTRFS, especially with
> Bluestore "around the corner".
> 
> Just pointing out that it has the necessary checksumming features.

Sure. sorry.

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-28 Thread Christian Balzer
On Tue, 28 Jun 2016 09:15:50 +0200 Stefan Priebe - Profihost AG wrote:

> 
> Am 28.06.2016 um 09:06 schrieb Christian Balzer:
> > 
> > Hello,
> > 
> > On Tue, 28 Jun 2016 08:34:26 +0200 Stefan Priebe - Profihost AG wrote:
> > 
> >> Am 27.06.2016 um 02:14 schrieb Christian Balzer:
> >>> On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
> >>>
>  Hi,
> 
>  is there any option or chance to have auto repair of pgs in hammer?
> 
> >>> Short answer: 
> >>> No, in any version of Ceph.
> >>>
> >>> Long answer:
> >>> There are currently no checksums generated by Ceph and present to
> >>> facilitate this.
> >>
> >> Yes but if you have a replication count of 3 ceph pg repair was always
> >> working for me since bobtail. I've never seen corrupted data.
> >>
> > That's good and lucky for you.
> > 
> > Not seeing corrupted data also doesn't mean there wasn't any
> > corruption, it could simply mean that the data in question wasn't used
> > or overwritten before being read again.
> 
> Sure that's correct ;-) It just has happened so often that i thought
> this could not always be the case. We had a lot of kernel crashes the
> last month regarding XFS and bcache.
> 
That's something slightly different than silent data corruption, but I
can't really comment on this.

> > In the handful of scrub errors I ever encountered there was one case
> > where blindly doing a repair from the primary PG would have been the
> > wrong thing to do.
> 
> Are your sure it really simply uses the primary pg? i always thought it
> compares the sizes of the object and date with a replication factor 3.
> 
Yes, I'm sure:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10182.html

Sage also replied in a similar vein to you about this 4 years ago:
http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11575
"In general we don't repair automatically lest we inadvertantly propagate
bad data or paper over a bug."

And finally the Bluestore tech talk from last week from 35:40.
https://www.youtube.com/watch?v=kuacS4jw5pM

Christian
> 
> >>> If you'd run BTRFS or ZFS with filestore you'd be closer to an
> >>> automatic state of affairs, as these filesystems do strong checksums
> >>> and check them on reads and would create an immediate I/O error if
> >>> something got corrupted, thus making it clear which OSD is in need of
> >>> the hammer of healing.
> >>
> >> Yes but at least BTRFS is still not working for ceph due to
> >> fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
> >> doubles it's I/O after a few days.
> >>
> > Nobody (well, certainly not me) suggested to use BTRFS, especially with
> > Bluestore "around the corner".
> > 
> > Just pointing out that it has the necessary checksumming features.
> 
> Sure. sorry.
> 
> Stefan
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-28 Thread Lionel Bouton
Hi,

Le 28/06/2016 08:34, Stefan Priebe - Profihost AG a écrit :
> [...]
> Yes but at least BTRFS is still not working for ceph due to
> fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
> doubles it's I/O after a few days.

BTRFS autodefrag is not working over the long term. That said BTRFS
itself is working far better than XFS on our cluster (noticeably better
latencies). As not having checksums wasn't an option we coded and are
using this:

https://github.com/jtek/ceph-utils/blob/master/btrfs-defrag-scheduler.rb

This actually saved us from 2 faulty disk controllers which were
infrequently corrupting data in our cluster.

Mandatory too for performance :
filestore btrfs snap = false

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should I use different pool?

2016-06-28 Thread Brian ::
+1 for 18TB and all SSD - If you need any decent IOPS with a cluster
this size then I all SSDs are the way to go.


On Mon, Jun 27, 2016 at 11:47 AM, David  wrote:
> Yes you should definitely create different pools for different HDD types.
> Another decision you need to make is whether you want dedicated nodes for
> SSD or want to mix them in the same node. You need to ensure you have
> sufficient CPU and fat enough network links to get the most out of your
> SSD's.
>
> You can add multiple data pools to Cephfs so if you can identify the hot and
> cold data in your dataset you could do "manual" tiering as an alternative to
> using a cache tier.
>
> 18TB is a relatively small capacity, have you considered an all-SSD cluster?
>
> On Sun, Jun 26, 2016 at 10:18 AM, EM - SC 
> wrote:
>>
>> Hi,
>>
>> I'm new to ceph and in the mailing list, so hello all!
>>
>> I'm testing ceph and the plan is to migrate our current 18TB storage
>> (zfs/nfs) to ceph. This will be using CephFS and mounted in our backend
>> application.
>> We are also planning on using virtualisation (opennebula) with rbd for
>> images and, if it makes sense, use rbd for our oracle server.
>>
>> My question is about pools.
>> For what I read, I should create different pools for different HD speed
>> (SAS, SSD, etc).
>> - What else should I consider for creating pools?
>> - should I create different pools for rbd, cephfs, etc?
>>
>> thanks in advanced,
>> em
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should I use different pool?

2016-06-28 Thread EM - SC
Thanks for the answers.

SSD could be an option, but the idea is to grow (if business goes well)
from those 18TB.
I am thinking, however, after reading some bad comments about CephFS
with very large directories with many subdirectories (which is our case)
doesn't perform very well.

The big picture here is that we are moving to a new datacenter.
Currently our NAS is on ZFS and has the 18TB of content for our application.
We would like to move away from NFS and use CEPH object gateway. But
this will require dev time, which we will have only after the DC migration.

So, the idea was to go to CephFS just to migrate from our current ZFS
nas, and then, eventually, migrate that data to the object gateway. But,
I'm starting to believe that it is better to have a ZFS NAS in the new
DC and migrate directly from ZFS to object gateway once we are in the
new DC.






Brian :: wrote:
> +1 for 18TB and all SSD - If you need any decent IOPS with a cluster
> this size then I all SSDs are the way to go.
>
>
> On Mon, Jun 27, 2016 at 11:47 AM, David  wrote:
>> Yes you should definitely create different pools for different HDD types.
>> Another decision you need to make is whether you want dedicated nodes for
>> SSD or want to mix them in the same node. You need to ensure you have
>> sufficient CPU and fat enough network links to get the most out of your
>> SSD's.
>>
>> You can add multiple data pools to Cephfs so if you can identify the hot and
>> cold data in your dataset you could do "manual" tiering as an alternative to
>> using a cache tier.
>>
>> 18TB is a relatively small capacity, have you considered an all-SSD cluster?
>>
>> On Sun, Jun 26, 2016 at 10:18 AM, EM - SC 
>> wrote:
>>> Hi,
>>>
>>> I'm new to ceph and in the mailing list, so hello all!
>>>
>>> I'm testing ceph and the plan is to migrate our current 18TB storage
>>> (zfs/nfs) to ceph. This will be using CephFS and mounted in our backend
>>> application.
>>> We are also planning on using virtualisation (opennebula) with rbd for
>>> images and, if it makes sense, use rbd for our oracle server.
>>>
>>> My question is about pools.
>>> For what I read, I should create different pools for different HD speed
>>> (SAS, SSD, etc).
>>> - What else should I consider for creating pools?
>>> - should I create different pools for rbd, cephfs, etc?
>>>
>>> thanks in advanced,
>>> em
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] VM shutdown because of PG increase

2016-06-28 Thread 한승진
Hi, Cephers.

Our ceph version is Hammer(0.94.7).

I implemented ceph with OpenStack, all instances use block storage as a
local volume.

After increasing the PG number from 256 to 768, many vms are shutdown.

That was very strange case for me.

Below vm's is libvirt error log.

osd/osd_types.cc: In function 'bool pg_t::is_split(unsigned int, unsigned
int, std::set*) const' thread 7fc4c01b9700 time 2016-06-28
14:17:35.004480
osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num)
 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: (()+0x15374b) [0x7fc4d1ca674b]
 2: (()+0x222f01) [0x7fc4d1d75f01]
 3: (()+0x222fdd) [0x7fc4d1d75fdd]
 4: (()+0xc5339) [0x7fc4d1c18339]
 5: (()+0xdc3e5) [0x7fc4d1c2f3e5]
 6: (()+0xdcc4a) [0x7fc4d1c2fc4a]
 7: (()+0xde1b2) [0x7fc4d1c311b2]
 8: (()+0xe3fbf) [0x7fc4d1c36fbf]
 9: (()+0x2c3b99) [0x7fc4d1e16b99]
 10: (()+0x2f160d) [0x7fc4d1e4460d]
 11: (()+0x80a5) [0x7fc4cd7aa0a5]
 12: (clone()+0x6d) [0x7fc4cd4d7cfd]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
2016-06-28 05:17:36.557+: shutting down


Could you anybody explain this?

Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-28 Thread Stefan Priebe - Profihost AG
Am 28.06.2016 um 09:42 schrieb Christian Balzer:
> On Tue, 28 Jun 2016 09:15:50 +0200 Stefan Priebe - Profihost AG wrote:
> 
>>
>> Am 28.06.2016 um 09:06 schrieb Christian Balzer:
>>>
>>> Hello,
>>>
>>> On Tue, 28 Jun 2016 08:34:26 +0200 Stefan Priebe - Profihost AG wrote:
>>>
 Am 27.06.2016 um 02:14 schrieb Christian Balzer:
> On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
>
>> Hi,
>>
>> is there any option or chance to have auto repair of pgs in hammer?
>>
> Short answer: 
> No, in any version of Ceph.
>
> Long answer:
> There are currently no checksums generated by Ceph and present to
> facilitate this.

 Yes but if you have a replication count of 3 ceph pg repair was always
 working for me since bobtail. I've never seen corrupted data.

>>> That's good and lucky for you.
>>>
>>> Not seeing corrupted data also doesn't mean there wasn't any
>>> corruption, it could simply mean that the data in question wasn't used
>>> or overwritten before being read again.
>>
>> Sure that's correct ;-) It just has happened so often that i thought
>> this could not always be the case. We had a lot of kernel crashes the
>> last month regarding XFS and bcache.
>>
> That's something slightly different than silent data corruption, but I
> can't really comment on this.

ah OK - no i was not talking about silent data corruption.

> 
>>> In the handful of scrub errors I ever encountered there was one case
>>> where blindly doing a repair from the primary PG would have been the
>>> wrong thing to do.
>>
>> Are your sure it really simply uses the primary pg? i always thought it
>> compares the sizes of the object and date with a replication factor 3.
>>
> Yes, I'm sure:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10182.html
> 
> Sage also replied in a similar vein to you about this 4 years ago:
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11575
> "In general we don't repair automatically lest we inadvertantly propagate
> bad data or paper over a bug."
> 
> And finally the Bluestore tech talk from last week from 35:40.
> https://www.youtube.com/watch?v=kuacS4jw5pM

Thanks!

> 
> Christian
>>
> If you'd run BTRFS or ZFS with filestore you'd be closer to an
> automatic state of affairs, as these filesystems do strong checksums
> and check them on reads and would create an immediate I/O error if
> something got corrupted, thus making it clear which OSD is in need of
> the hammer of healing.

 Yes but at least BTRFS is still not working for ceph due to
 fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
 doubles it's I/O after a few days.

>>> Nobody (well, certainly not me) suggested to use BTRFS, especially with
>>> Bluestore "around the corner".
>>>
>>> Just pointing out that it has the necessary checksumming features.
>>
>> Sure. sorry.
>>
>> Stefan
>>
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Cache

2016-06-28 Thread Mohd Zainal Abidin Rabani
Hi,

 

We have using osd on production. SSD as journal. We have test io and show
good result. We plan to use osd cache to get better iops. Have anyone here
success deploy osd cache? Please share or advice here.

 

Thanks.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How many nodes/OSD can fail

2016-06-28 Thread willi.feh...@t-online.de
Hello,

I'm still very new to Ceph. I've created a small test Cluster.
 
ceph-node1
osd0
osd1
osd2
ceph-node2
osd3
osd4
osd5
ceph-node3
osd6
osd7
osd8
 
My pool for CephFS has a replication count of 3. I've powered of 2 nodes(6 
OSDs went down) and my cluster status became critical and my ceph 
clients(cephfs) run into a timeout. My data(I had only one file on my pool) 
was still on one of the active OSDs. Is this the expected behaviour that 
the Cluster status became critical and my Clients run into a timeout?
 
Many thanks for your feedback.
 
Regards - Willi
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM shutdown because of PG increase

2016-06-28 Thread Torsten Urbas
Hello,

are you sure about your Ceph version? Below’s output states "0.94.1“.

We have ran into a similar issue with Ceph 0.94.3 and can confirm that we
no longer see that with Ceph 0.94.5.

If you upgraded during operation, did you at least migrate all of your VMs
at least once to make sure they are using the most recent librbd?

Cheers,
Torsten

-- 
Torsten Urbas
Mobile: +49 (170) 77 38 251 <+49%20(170)%2077%2038%20251>

Am 28. Juni 2016 um 11:00:21, 한승진 (yongi...@gmail.com) schrieb:

Hi, Cephers.

Our ceph version is Hammer(0.94.7).

I implemented ceph with OpenStack, all instances use block storage as a
local volume.

After increasing the PG number from 256 to 768, many vms are shutdown.

That was very strange case for me.

Below vm's is libvirt error log.

osd/osd_types.cc: In function 'bool pg_t::is_split(unsigned int, unsigned
int, std::set*) const' thread 7fc4c01b9700 time 2016-06-28
14:17:35.004480
osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num)
 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: (()+0x15374b) [0x7fc4d1ca674b]
 2: (()+0x222f01) [0x7fc4d1d75f01]
 3: (()+0x222fdd) [0x7fc4d1d75fdd]
 4: (()+0xc5339) [0x7fc4d1c18339]
 5: (()+0xdc3e5) [0x7fc4d1c2f3e5]
 6: (()+0xdcc4a) [0x7fc4d1c2fc4a]
 7: (()+0xde1b2) [0x7fc4d1c311b2]
 8: (()+0xe3fbf) [0x7fc4d1c36fbf]
 9: (()+0x2c3b99) [0x7fc4d1e16b99]
 10: (()+0x2f160d) [0x7fc4d1e4460d]
 11: (()+0x80a5) [0x7fc4cd7aa0a5]
 12: (clone()+0x6d) [0x7fc4cd4d7cfd]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
2016-06-28 05:17:36.557+: shutting down


Could you anybody explain this?

Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Cache

2016-06-28 Thread David
Hi,

Please clarify what you mean by "osd cache". Raid controller cache or
Ceph's cache tiering feature?

On Tue, Jun 28, 2016 at 10:21 AM, Mohd Zainal Abidin Rabani <
zai...@nocser.net> wrote:

> Hi,
>
>
>
> We have using osd on production. SSD as journal. We have test io and show
> good result. We plan to use osd cache to get better iops. Have anyone here
> success deploy osd cache? Please share or advice here.
>
>
>
> Thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM shutdown because of PG increase

2016-06-28 Thread Brad Hubbard
On Tue, Jun 28, 2016 at 7:39 PM, Torsten Urbas  wrote:
> Hello,
>
> are you sure about your Ceph version? Below’s output states "0.94.1“.

I suspect it's quite likely that the cluster was upgraded but not the
clients or,
if the clients were upgraded, that the VMs were not restarted so they still have
the old binary images in memory and thus, still report 0.94.1.

A restart on any of the remaining VMs that have not been restarted would be a
good idea.

You can identify these VMs as they should show librbd/librados as "deleted" in
/proc/[PID]/maps output (this will need to be the PID of the qemu-kvm instance).

HTH,
Brad

>
> We have ran into a similar issue with Ceph 0.94.3 and can confirm that we no
> longer see that with Ceph 0.94.5.
>
> If you upgraded during operation, did you at least migrate all of your VMs
> at least once to make sure they are using the most recent librbd?
>
> Cheers,
> Torsten
>
> --
> Torsten Urbas
> Mobile: +49 (170) 77 38 251
>
> Am 28. Juni 2016 um 11:00:21, 한승진 (yongi...@gmail.com) schrieb:
>
> Hi, Cephers.
>
> Our ceph version is Hammer(0.94.7).
>
> I implemented ceph with OpenStack, all instances use block storage as a
> local volume.
>
> After increasing the PG number from 256 to 768, many vms are shutdown.
>
> That was very strange case for me.
>
> Below vm's is libvirt error log.
>
> osd/osd_types.cc: In function 'bool pg_t::is_split(unsigned int, unsigned
> int, std::set*) const' thread 7fc4c01b9700 time 2016-06-28
> 14:17:35.004480
> osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num)
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>  1: (()+0x15374b) [0x7fc4d1ca674b]
>  2: (()+0x222f01) [0x7fc4d1d75f01]
>  3: (()+0x222fdd) [0x7fc4d1d75fdd]
>  4: (()+0xc5339) [0x7fc4d1c18339]
>  5: (()+0xdc3e5) [0x7fc4d1c2f3e5]
>  6: (()+0xdcc4a) [0x7fc4d1c2fc4a]
>  7: (()+0xde1b2) [0x7fc4d1c311b2]
>  8: (()+0xe3fbf) [0x7fc4d1c36fbf]
>  9: (()+0x2c3b99) [0x7fc4d1e16b99]
>  10: (()+0x2f160d) [0x7fc4d1e4460d]
>  11: (()+0x80a5) [0x7fc4cd7aa0a5]
>  12: (clone()+0x6d) [0x7fc4cd4d7cfd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> 2016-06-28 05:17:36.557+: shutting down
>
>
> Could you anybody explain this?
>
> Thank you.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph not replicating to all osds

2016-06-28 Thread Brad Hubbard
On Tue, Jun 28, 2016 at 4:17 PM, Ishmael Tsoaela  wrote:
> Hi,
>
> I am new to Ceph and most of the concepts are new.
>
> image mounted on nodeA, FS is XFS
>
> sudo mkfs.xfs  /dev/rbd/data/data_01
>
> sudo mount /dev/rbd/data/data_01 /mnt
>
> cluster_master@nodeB:~$ mount|grep rbd
> /dev/rbd0 on /mnt type xfs (rw)

XFS is not a network filesystem. It can not be mounted on more than
one system at any given
time without corrupting it, even if one mountpoint does no writes, teh
log will still be replayed
during the mount and that should be enough for at least one system to
detect the filesystem
is corrupted.

Cheers,
Brad

>
>
> Basically I need a way to write on nodeA, mount the same image on nodeB and
> be able to write on either of the nodes, Data should be repilcated to both
> but I see on the logs for both osd, data is only stored on one.
>
>
> I am busy looking at CEPHFS
>
>
> thanks for the assistance.
>
>
>
>
>
>
>
>
>
>
> On Tue, Jun 28, 2016 at 1:09 AM, Christian Balzer  wrote:
>>
>>
>> Hello,
>>
>> On Mon, 27 Jun 2016 17:00:42 +0200 Ishmael Tsoaela wrote:
>>
>> > Hi ALL,
>> >
>> > Anyone can help with this issue would be much appreciated.
>> >
>> Your subject line has nothing to do with your "problem".
>>
>> You're alluding to OSD replication problems, obviously assuming that one
>> client would write to OSD A and the other client reading from OSD B.
>> Which is not how Ceph works, but again, that's not your problem.
>>
>> > I have created an  image on one client and mounted it on both 2 client I
>> > have setup.
>> >
>> Details missing, but it's pretty obvious that you created a plain FS like
>> Ext4 on that image.
>>
>> > When I write data on one client, I cannot access the data on another
>> > client, what could be causing this issue?
>> >
>> This has cropped up here frequently, you're confusing replicated BLOCK
>> storage like RBD or DRBD with shared file systems like NFS of CephFS.
>>
>> EXT4 and other normal FS can't do that and you just corrupted your FS on
>> that image.
>>
>> So either use CephFS or run OCFS2/GFS2 on your shared image and clients.
>>
>> Christian
>> --
>> Christian BalzerNetwork/Systems Engineer
>> ch...@gol.com   Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How many nodes/OSD can fail

2016-06-28 Thread David
Hi,

This is probably the min_size on your cephfs data and/or metadata pool. I
believe the default is 2, if you have less than 2 replicas available I/O
will stop. See:
http://docs.ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas

On Tue, Jun 28, 2016 at 10:23 AM, willi.feh...@t-online.de <
willi.feh...@t-online.de> wrote:

> Hello,
>
> I'm still very new to Ceph. I've created a small test Cluster.
>
>
>
> ceph-node1
>
> osd0
>
> osd1
>
> osd2
>
> ceph-node2
>
> osd3
>
> osd4
>
> osd5
>
> ceph-node3
>
> osd6
>
> osd7
>
> osd8
>
>
>
> My pool for CephFS has a replication count of 3. I've powered of 2 nodes(6
> OSDs went down) and my cluster status became critical and my ceph
> clients(cephfs) run into a timeout. My data(I had only one file on my pool)
> was still on one of the active OSDs. Is this the expected behaviour that
> the Cluster status became critical and my Clients run into a timeout?
>
>
>
> Many thanks for your feedback.
>
>
>
> Regards - Willi
>
>
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph not replicating to all osds

2016-06-28 Thread Ishmael Tsoaela
Thanks Brad,

I have looked through  OCFS2 and does exactly what I wanted.

On Tue, Jun 28, 2016 at 1:04 PM, Brad Hubbard  wrote:

> On Tue, Jun 28, 2016 at 4:17 PM, Ishmael Tsoaela 
> wrote:
> > Hi,
> >
> > I am new to Ceph and most of the concepts are new.
> >
> > image mounted on nodeA, FS is XFS
> >
> > sudo mkfs.xfs  /dev/rbd/data/data_01
> >
> > sudo mount /dev/rbd/data/data_01 /mnt
> >
> > cluster_master@nodeB:~$ mount|grep rbd
> > /dev/rbd0 on /mnt type xfs (rw)
>
> XFS is not a network filesystem. It can not be mounted on more than
> one system at any given
> time without corrupting it, even if one mountpoint does no writes, teh
> log will still be replayed
> during the mount and that should be enough for at least one system to
> detect the filesystem
> is corrupted.
>
> Cheers,
> Brad
>
> >
> >
> > Basically I need a way to write on nodeA, mount the same image on nodeB
> and
> > be able to write on either of the nodes, Data should be repilcated to
> both
> > but I see on the logs for both osd, data is only stored on one.
> >
> >
> > I am busy looking at CEPHFS
> >
> >
> > thanks for the assistance.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jun 28, 2016 at 1:09 AM, Christian Balzer  wrote:
> >>
> >>
> >> Hello,
> >>
> >> On Mon, 27 Jun 2016 17:00:42 +0200 Ishmael Tsoaela wrote:
> >>
> >> > Hi ALL,
> >> >
> >> > Anyone can help with this issue would be much appreciated.
> >> >
> >> Your subject line has nothing to do with your "problem".
> >>
> >> You're alluding to OSD replication problems, obviously assuming that one
> >> client would write to OSD A and the other client reading from OSD B.
> >> Which is not how Ceph works, but again, that's not your problem.
> >>
> >> > I have created an  image on one client and mounted it on both 2
> client I
> >> > have setup.
> >> >
> >> Details missing, but it's pretty obvious that you created a plain FS
> like
> >> Ext4 on that image.
> >>
> >> > When I write data on one client, I cannot access the data on another
> >> > client, what could be causing this issue?
> >> >
> >> This has cropped up here frequently, you're confusing replicated BLOCK
> >> storage like RBD or DRBD with shared file systems like NFS of CephFS.
> >>
> >> EXT4 and other normal FS can't do that and you just corrupted your FS on
> >> that image.
> >>
> >> So either use CephFS or run OCFS2/GFS2 on your shared image and clients.
> >>
> >> Christian
> >> --
> >> Christian BalzerNetwork/Systems Engineer
> >> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> >> http://www.gol.com/
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Alex Gorbachev
After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
these issues where an OSD would fail with the stack below.  I logged a
bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
a similar description at https://lkml.org/lkml/2016/6/22/102, but the
odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
just the noop scheduler.

Does the ceph kernel code somehow use the fair scheduler code block?

Thanks
--
Alex Gorbachev
Storcium

Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
03/04/2015
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
0010:[]  []
task_numa_find_cpu+0x22e/0x6f0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
0018:880f79fbb818  EFLAGS: 00010206
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
 RBX: 880f79fbb8b8 RCX: 
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
 RSI:  RDI: 8810352d4800
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
0009 R11: 0006 R12: 8807c3adc4c0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
0006 R14: 033e R15: fec7
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
7f30e46b8700() GS:88105f58()
knlGS:
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
 ES:  CR0: 80050033
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
1321a000 CR3: 000853598000 CR4: 000406e0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
813d050f 000d 0045 880f79df8000
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
033f  00016b00 033f
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
880f79df8000 880f79fbb8b8 01f4 0054
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
[] ? cpumask_next_and+0x2f/0x40
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
[] task_numa_migrate+0x43e/0x9b0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
[] ? update_cfs_shares+0xbc/0x100
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
[] numa_migrate_preferred+0x79/0x80
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
[] task_numa_fault+0x7f4/0xd40
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
[] ? timerqueue_del+0x24/0x70
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
[] ? should_numa_migrate_memory+0x55/0x130
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
[] handle_mm_fault+0xbc0/0x1820
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
[] ? __hrtimer_init+0x90/0x90
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
[] ? remove_wait_queue+0x4d/0x60
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
[] ? poll_freewait+0x4a/0xa0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
[] __do_page_fault+0x197/0x400
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
[] do_page_fault+0x22/0x30
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
[] page_fault+0x28/0x30
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
[] ? copy_page_to_iter_iovec+0x5f/0x300
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
[] ? select_task_rq_fair+0x625/0x700
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
[] copy_page_to_iter+0x16/0xa0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
[] skb_copy_datagram_iter+0x14d/0x280
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
[] tcp_recvmsg+0x613/0xbe0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
[] inet_recvmsg+0x7e/0xb0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
[] sock_recvmsg+0x3b/0x50
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
[] SYSC_recvfrom+0xe1/0x160
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
[] ? ktime_get_ts64+0x45/0xf0
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
[] SyS_recvfrom+0xe/0x10
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686259]
[] entry_SYSCALL_64_fastpath+0x16/0x71
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686287] Code: 55 b0 4c
89 f7 e8 53 cd ff ff 48 8b 55 b0 49 8b 4e 78 48 8b 82 d8 01 00 00 48
83 c1 01 31 d2 49 0f af 86 b0 00 00 00 4c 8b 73 78 <48> f7 f1 48 8b 4b
20 49 89 c0 48 29 c1 48 8b 45 d0 4c 03 43 48
Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686512] RIP
[] task_numa_f

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Stefan Priebe - Profihost AG
Yes you need those lkml patches. I added them to our custom 4.4 Kernel too to 
prevent this.

Stefan

Excuse my typo sent from my mobile phone.

> Am 28.06.2016 um 17:05 schrieb Alex Gorbachev :
> 
> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
> these issues where an OSD would fail with the stack below.  I logged a
> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
> just the noop scheduler.
> 
> Does the ceph kernel code somehow use the fair scheduler code block?
> 
> Thanks
> --
> Alex Gorbachev
> Storcium
> 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
> 03/04/2015
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
> 0010:[]  []
> task_numa_find_cpu+0x22e/0x6f0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
> 0018:880f79fbb818  EFLAGS: 00010206
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>  RBX: 880f79fbb8b8 RCX: 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>  RSI:  RDI: 8810352d4800
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
> 0009 R11: 0006 R12: 8807c3adc4c0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
> 0006 R14: 033e R15: fec7
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
> 7f30e46b8700() GS:88105f58()
> knlGS:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>  ES:  CR0: 80050033
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
> 1321a000 CR3: 000853598000 CR4: 000406e0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
> 813d050f 000d 0045 880f79df8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
> 033f  00016b00 033f
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
> 880f79df8000 880f79fbb8b8 01f4 0054
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
> [] ? cpumask_next_and+0x2f/0x40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
> [] task_numa_migrate+0x43e/0x9b0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
> [] ? update_cfs_shares+0xbc/0x100
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
> [] numa_migrate_preferred+0x79/0x80
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
> [] task_numa_fault+0x7f4/0xd40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
> [] ? timerqueue_del+0x24/0x70
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
> [] ? should_numa_migrate_memory+0x55/0x130
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
> [] handle_mm_fault+0xbc0/0x1820
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
> [] ? __hrtimer_init+0x90/0x90
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
> [] ? remove_wait_queue+0x4d/0x60
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
> [] ? poll_freewait+0x4a/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
> [] __do_page_fault+0x197/0x400
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
> [] do_page_fault+0x22/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
> [] page_fault+0x28/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
> [] ? copy_page_to_iter_iovec+0x5f/0x300
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
> [] ? select_task_rq_fair+0x625/0x700
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
> [] copy_page_to_iter+0x16/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
> [] skb_copy_datagram_iter+0x14d/0x280
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
> [] tcp_recvmsg+0x613/0xbe0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
> [] inet_recvmsg+0x7e/0xb0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
> [] sock_recvmsg+0x3b/0x50
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
> [] SYSC_recvfrom+0xe1/0x160
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
> [] ? ktime_get_ts64+0x45/0xf0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
> [] SyS_recvfrom+0xe/0x10
> Jun 28 09:46:41 roc04

[ceph-users] CPU use for OSD daemon

2016-06-28 Thread George Shuklin

Hello.

I'm testing different configuration for Ceph. I found that osd are 
REALLY hungry for cpu.


I've created a tiny pool with size 1 with single OSD made of fast intel 
SSD (2500-series), on old dell server (R210),  Xeon E3-1230 V2 @ 3.30GHz.


And when I benchmark it I see some horribly-low performance and clear 
bottleneck at ceph-osd process: it consumes about 110% of CPU and giving 
me following results: 127 iops in fio benchmark (4k randwrite) for rbd 
device, rados benchmark gives me ~21 IOPS and 76Mb/s (write).


It this a normal CPU utilization for osd daemon for such tiny performance?

Relevant part of the crush map:

rule rule_fast {
ruleset 1
type replicated
min_size 1
max_size 10
step take fast
step chooseleaf firstn 0 type osd
step emit
}
root fast2500 {
id -17
alg straw
hash 0  # rjenkins1
item pp7 weight 1.0
}

host pp7 {
id -11
alg straw
hash 0  # rjenkins1
item osd.5 weight 1.0
}


host pp7 {
id -11
alg straw
hash 0  # rjenkins1
item osd.5 weight 1.0
}

device 5 osd.5
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CPU use for OSD daemon

2016-06-28 Thread Christian Balzer

Hello,

On Tue, 28 Jun 2016 18:23:02 +0300 George Shuklin wrote:

> Hello.
> 
> I'm testing different configuration for Ceph. 

What version...

> I found that osd are 
> REALLY hungry for cpu.
> 
They can be, but unlikely in your case.

> I've created a tiny pool with size 1 with single OSD made of fast intel 
> SSD (2500-series), on old dell server (R210),  Xeon E3-1230 V2 @ 3.30GHz.
> 
At a replication size of 1, a totally unrealistic test scenario.

Ignoring that, an Intel SSD PRO 2500 is a consumer SSD and as such with
near certainty ill suited for usage with Ceph, especially when it comes to
journals. 
Check/google the countless threads about what constitutes SSDs suitable for
Ceph usage. 

> And when I benchmark it 
How? 
Fio, we can gather, but whether against a RBD image, with user or kernel
client, with the fio RBD engine...

>I see some horribly-low performance and clear 
> bottleneck at ceph-osd process: it consumes about 110% of CPU and giving 
110% actual CPU usage?
I'd wager a significant amount of that is IOWAIT...

> me following results: 127 iops in fio benchmark (4k randwrite) for rbd 
> device, rados benchmark gives me ~21 IOPS and 76Mb/s (write).
> 
Pretty clear indication that the SSD isn't handling sync writes well,
lacking further info.
 
> It this a normal CPU utilization for osd daemon for such tiny
> performance?
> 
> Relevant part of the crush map:
> 
Irrelevant in this context really.


Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Tim Bishop
Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
useful information to add other than it's not just you.

Tim.

On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
> these issues where an OSD would fail with the stack below.  I logged a
> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
> just the noop scheduler.
> 
> Does the ceph kernel code somehow use the fair scheduler code block?
> 
> Thanks
> --
> Alex Gorbachev
> Storcium
> 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
> 03/04/2015
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
> 0010:[]  []
> task_numa_find_cpu+0x22e/0x6f0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
> 0018:880f79fbb818  EFLAGS: 00010206
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>  RBX: 880f79fbb8b8 RCX: 
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>  RSI:  RDI: 8810352d4800
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
> 0009 R11: 0006 R12: 8807c3adc4c0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
> 0006 R14: 033e R15: fec7
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
> 7f30e46b8700() GS:88105f58()
> knlGS:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>  ES:  CR0: 80050033
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
> 1321a000 CR3: 000853598000 CR4: 000406e0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
> 813d050f 000d 0045 880f79df8000
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
> 033f  00016b00 033f
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
> 880f79df8000 880f79fbb8b8 01f4 0054
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
> [] ? cpumask_next_and+0x2f/0x40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
> [] task_numa_migrate+0x43e/0x9b0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
> [] ? update_cfs_shares+0xbc/0x100
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
> [] numa_migrate_preferred+0x79/0x80
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
> [] task_numa_fault+0x7f4/0xd40
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
> [] ? timerqueue_del+0x24/0x70
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
> [] ? should_numa_migrate_memory+0x55/0x130
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
> [] handle_mm_fault+0xbc0/0x1820
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
> [] ? __hrtimer_init+0x90/0x90
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
> [] ? remove_wait_queue+0x4d/0x60
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
> [] ? poll_freewait+0x4a/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
> [] __do_page_fault+0x197/0x400
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
> [] do_page_fault+0x22/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
> [] page_fault+0x28/0x30
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
> [] ? copy_page_to_iter_iovec+0x5f/0x300
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
> [] ? select_task_rq_fair+0x625/0x700
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
> [] copy_page_to_iter+0x16/0xa0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
> [] skb_copy_datagram_iter+0x14d/0x280
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
> [] tcp_recvmsg+0x613/0xbe0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
> [] inet_recvmsg+0x7e/0xb0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
> [] sock_recvmsg+0x3b/0x50
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
> [] SYSC_recvfrom+0xe1/0x160
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
> [] ? ktime_get_ts64+0x45/0xf0
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
> [] SyS_recvfrom+0xe/0x10
> Jun 28 09:46:41 roc04

[ceph-users] Another cluster completely hang

2016-06-28 Thread Mario Giammarco
Hello,
this is the second time that happens to me, I hope that someone can 
explain what I can do.
Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.

One hdd goes down due to bad sectors. 
Ceph recovers but it ends with:

cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
 health HEALTH_WARN
3 pgs down
19 pgs incomplete
19 pgs stuck inactive
19 pgs stuck unclean
7 requests are blocked > 32 sec
 monmap e11: 7 mons at
{0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
election epoch 722, quorum 
0,1,2,3,4,5,6 1,4,2,0,3,5,6
 osdmap e10182: 10 osds: 10 up, 10 in
  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
9136 GB used, 5710 GB / 14846 GB avail
1005 active+clean
  16 incomplete
   3 down+incomplete

Unfortunately "7 requests blocked" means no virtual machine can boot 
because ceph has stopped i/o.

I can accept to lose some data, but not ALL data!
Can you help me please?
Thanks,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mounting Ceph RBD under xenserver

2016-06-28 Thread Mike Jacobacci
Hi all,

Is there anyone using rbd for xenserver vm storage?  I have XenServer 7 and the 
latest Ceph, I am looking for the the best way to mount the rbd volume under 
XenServer.  There is not much recent info out there I have found except for 
this:
http://www.mad-hacking.net/documentation/linux/ha-cluster/storage-area-network/ceph-xen-domu.xml
 


and this plugin (which looks nice):
https://github.com/mstarikov/rbdsr 

I am looking for a way that doesn’t involve too much command line so other 
admins that don’t know Ceph or XenServer very well can work with it.  I am just 
curious what others are doing… Any help is greatly appreciated!

Cheers,
Mike




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-28 Thread Oliver Dzombic
Hi Mario,

please give some more details:

Please the output of:

ceph osd pool ls detail
ceph osd df
ceph --version

ceph -w for 10 seconds ( use http://pastebin.com/ please )

ceph osd crush dump ( also pastebin pls )

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
> Hello,
> this is the second time that happens to me, I hope that someone can 
> explain what I can do.
> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
> 
> One hdd goes down due to bad sectors. 
> Ceph recovers but it ends with:
> 
> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>  health HEALTH_WARN
> 3 pgs down
> 19 pgs incomplete
> 19 pgs stuck inactive
> 19 pgs stuck unclean
> 7 requests are blocked > 32 sec
>  monmap e11: 7 mons at
> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
> election epoch 722, quorum 
> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>  osdmap e10182: 10 osds: 10 up, 10 in
>   pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
> 9136 GB used, 5710 GB / 14846 GB avail
> 1005 active+clean
>   16 incomplete
>3 down+incomplete
> 
> Unfortunately "7 requests blocked" means no virtual machine can boot 
> because ceph has stopped i/o.
> 
> I can accept to lose some data, but not ALL data!
> Can you help me please?
> Thanks,
> Mario
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-28 Thread Stefan Priebe - Profihost AG
And ceph health detail

Stefan

Excuse my typo sent from my mobile phone.

> Am 28.06.2016 um 19:28 schrieb Oliver Dzombic :
> 
> Hi Mario,
> 
> please give some more details:
> 
> Please the output of:
> 
> ceph osd pool ls detail
> ceph osd df
> ceph --version
> 
> ceph -w for 10 seconds ( use http://pastebin.com/ please )
> 
> ceph osd crush dump ( also pastebin pls )
> 
> -- 
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
>> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>> Hello,
>> this is the second time that happens to me, I hope that someone can 
>> explain what I can do.
>> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>> 
>> One hdd goes down due to bad sectors. 
>> Ceph recovers but it ends with:
>> 
>> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>> health HEALTH_WARN
>>3 pgs down
>>19 pgs incomplete
>>19 pgs stuck inactive
>>19 pgs stuck unclean
>>7 requests are blocked > 32 sec
>> monmap e11: 7 mons at
>> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
>> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
>> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
>>election epoch 722, quorum 
>> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>> osdmap e10182: 10 osds: 10 up, 10 in
>>  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
>>9136 GB used, 5710 GB / 14846 GB avail
>>1005 active+clean
>>  16 incomplete
>>   3 down+incomplete
>> 
>> Unfortunately "7 requests blocked" means no virtual machine can boot 
>> because ceph has stopped i/o.
>> 
>> I can accept to lose some data, but not ALL data!
>> Can you help me please?
>> Thanks,
>> Mario
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CPU use for OSD daemon

2016-06-28 Thread Alexandre DERUMIER
>>And when I benchmark it I see some horribly-low performance and clear
>>bottleneck at ceph-osd process: it consumes about 110% of CPU and giving
>>me following results: 127 iops in fio benchmark (4k randwrite) for rbd
>>device, rados benchmark gives me ~21 IOPS and 76Mb/s (write).

on a 2x xeon 3,1ghz  10 cores (20 cores total), I can reach around 40 iops 
4k read (80% total cpu), or 7 iops 4k write (1x repli) 100% total cpu

this is with jemalloc, debug and cephx are disabled.


- Mail original -
De: "George Shuklin" 
À: "ceph-users" 
Envoyé: Mardi 28 Juin 2016 17:23:02
Objet: [ceph-users] CPU use for OSD daemon

Hello. 

I'm testing different configuration for Ceph. I found that osd are 
REALLY hungry for cpu. 

I've created a tiny pool with size 1 with single OSD made of fast intel 
SSD (2500-series), on old dell server (R210), Xeon E3-1230 V2 @ 3.30GHz. 

And when I benchmark it I see some horribly-low performance and clear 
bottleneck at ceph-osd process: it consumes about 110% of CPU and giving 
me following results: 127 iops in fio benchmark (4k randwrite) for rbd 
device, rados benchmark gives me ~21 IOPS and 76Mb/s (write). 

It this a normal CPU utilization for osd daemon for such tiny performance? 

Relevant part of the crush map: 

rule rule_fast { 
ruleset 1 
type replicated 
min_size 1 
max_size 10 
step take fast 
step chooseleaf firstn 0 type osd 
step emit 
} 
root fast2500 { 
id -17 
alg straw 
hash 0 # rjenkins1 
item pp7 weight 1.0 
} 

host pp7 { 
id -11 
alg straw 
hash 0 # rjenkins1 
item osd.5 weight 1.0 
} 


host pp7 { 
id -11 
alg straw 
hash 0 # rjenkins1 
item osd.5 weight 1.0 
} 

device 5 osd.5 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Stefan Priebe - Profihost AG
Please be aware that you may need even more patches. Overall this needs 3 
patches. Where the first two try to fix a bug and the 3rd one fixes the fixes + 
even more bugs related to the scheduler. I've no idea on which patch level 
Ubuntu is.

Stefan

Excuse my typo sent from my mobile phone.

> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
> 
> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
> useful information to add other than it's not just you.
> 
> Tim.
> 
>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>> these issues where an OSD would fail with the stack below.  I logged a
>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>> just the noop scheduler.
>> 
>> Does the ceph kernel code somehow use the fair scheduler code block?
>> 
>> Thanks
>> --
>> Alex Gorbachev
>> Storcium
>> 
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>> 03/04/2015
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>> 0010:[]  []
>> task_numa_find_cpu+0x22e/0x6f0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>> 0018:880f79fbb818  EFLAGS: 00010206
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>  RBX: 880f79fbb8b8 RCX: 
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>>  RSI:  RDI: 8810352d4800
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>> 0009 R11: 0006 R12: 8807c3adc4c0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>> 0006 R14: 033e R15: fec7
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>> 7f30e46b8700() GS:88105f58()
>> knlGS:
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>>  ES:  CR0: 80050033
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>> 1321a000 CR3: 000853598000 CR4: 000406e0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>> 813d050f 000d 0045 880f79df8000
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>> 033f  00016b00 033f
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>> 880f79df8000 880f79fbb8b8 01f4 0054
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
>> [] ? cpumask_next_and+0x2f/0x40
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
>> [] task_numa_migrate+0x43e/0x9b0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
>> [] ? update_cfs_shares+0xbc/0x100
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
>> [] numa_migrate_preferred+0x79/0x80
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
>> [] task_numa_fault+0x7f4/0xd40
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
>> [] ? timerqueue_del+0x24/0x70
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
>> [] ? should_numa_migrate_memory+0x55/0x130
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
>> [] handle_mm_fault+0xbc0/0x1820
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
>> [] ? __hrtimer_init+0x90/0x90
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
>> [] ? remove_wait_queue+0x4d/0x60
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
>> [] ? poll_freewait+0x4a/0xa0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
>> [] __do_page_fault+0x197/0x400
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
>> [] do_page_fault+0x22/0x30
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
>> [] page_fault+0x28/0x30
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
>> [] ? copy_page_to_iter_iovec+0x5f/0x300
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
>> [] ? select_task_rq_fair+0x625/0x700
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
>> [] copy_page_to_iter+0x16/0xa0
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
>> [] skb_copy_datagram_iter+0x14d/0x280
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
>> [] tcp_recvmsg+0x613/0xbe0
>>

[ceph-users] Rebalancing cluster and client access

2016-06-28 Thread Sergey Osherov
 Hi everybody!

We have cluster with 12 storage nodes and replication 2.
When one node was destroyed our clients can not access to ceph cluster.
I read that with two copies in replication pool, ceph interrupt write operation 
during degraded state.

But it is not clear why clients can not read when the cluster is rebalancing?
Can clients write access to cluster (in degraded state) if will be 3 replicas?

With best regards, Serg Osherov.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalancing cluster and client access

2016-06-28 Thread Oliver Dzombic
Hi Sergey,

IF you have size = 2 and min_size = 1

then with 2 replica's all should be fine and accessable, even when 1
node goes down.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 28.06.2016 um 20:32 schrieb Sergey Osherov:
> Hi everybody!
> 
> We have cluster with 12 storage nodes and replication 2.
> When one node was destroyed our clients can not access to ceph cluster.
> I read that with two copies in replication pool, ceph interrupt write
> operation during degraded state.
> 
> But it is not clear why clients can not read when the cluster is
> rebalancing?
> Can clients write access to cluster (in degraded state) if will be 3
> replicas?
> 
> With best regards, Serg Osherov.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can not change access for containers

2016-06-28 Thread Yehuda Sadeh-Weinraub
On Tue, Jun 28, 2016 at 4:12 AM, John Mathew 
wrote:

> I am using radosgw as object storage in openstack liberty. I am using ceph
> jewel. Currently I can create public and private containers. But cannot
> change the access of containers ie. cannot change a public container to
> private and vice versa. There is pop-up. "Success: Successfully updated
> container access to public." But access is not changing. Couldn't find any
> errors in logs. I tried with ceph-infernalis, but couldn't recreate this
> with infernalis. Everything worked with infernalis. Could this be a bug
> with ceph jewel? Also does jewel support mulittenant namespace for
> containers.
>

Jewel does have a support for separate container namespaces (tenants).


>
> Thanks in advance
>
>
> COMMAND
>
> curl -X POST -i -H  "X-Auth-Token:x" -H "X-Container-Read: *" -L  "
> http://xxx:7480/swift/v1/pub5";
>

Can you try this instead?

curl -X POST -i -H  "X-Auth-Token:x" -H "X-Container-Read: .r:*"
-L  "http://xxx:7480/swift/v1/pub5";

Yehuda


>
> 2016-06-23 03:17:11.822539 7f0ae2ffd700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2016-06-23 03:17:33.822711 7f0ae2ffd700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2016-06-23 03:17:48.028376 7f09077fe700 20 RGWEnv::set(): HTTP_USER_AGENT:
> curl/7.35.0
> 2016-06-23 03:17:48.028397 7f09077fe700 20 RGWEnv::set(): HTTP_HOST:
> 10.10.20.9:7480
> 2016-06-23 03:17:48.028400 7f09077fe700 20 RGWEnv::set(): HTTP_ACCEPT: */*
> 2016-06-23 03:17:48.028403 7f09077fe700 20 RGWEnv::set():
> HTTP_X_AUTH_TOKEN: 5b83a5faf86e4df3baa087049e8a0b9a
> 2016-06-23 03:17:48.028410 7f09077fe700 20 RGWEnv::set():
> HTTP_X_CONTAINER_READ: *
> 2016-06-23 03:17:48.028412 7f09077fe700 20 RGWEnv::set(): REQUEST_METHOD:
> POST
> 2016-06-23 03:17:48.028414 7f09077fe700 20 RGWEnv::set(): REQUEST_URI:
> /swift/v1/pub5
> 2016-06-23 03:17:48.028416 7f09077fe700 20 RGWEnv::set(): QUERY_STRING:
> 2016-06-23 03:17:48.028422 7f09077fe700 20 RGWEnv::set(): REMOTE_USER:
> 2016-06-23 03:17:48.028424 7f09077fe700 20 RGWEnv::set(): SCRIPT_URI:
> /swift/v1/pub5
> 2016-06-23 03:17:48.028427 7f09077fe700 20 RGWEnv::set(): SERVER_PORT: 7480
> 2016-06-23 03:17:48.028429 7f09077fe700 20 HTTP_ACCEPT=*/*
> 2016-06-23 03:17:48.028430 7f09077fe700 20 HTTP_HOST=10.10.20.9:7480
> 2016-06-23 03:17:48.028431 7f09077fe700 20 HTTP_USER_AGENT=curl/7.35.0
> 2016-06-23 03:17:48.028432 7f09077fe700 20
> HTTP_X_AUTH_TOKEN=5b83a5faf86e4df3baa087049e8a0b9a
> 2016-06-23 03:17:48.028434 7f09077fe700 20 HTTP_X_CONTAINER_READ=*
> 2016-06-23 03:17:48.028435 7f09077fe700 20 QUERY_STRING=
> 2016-06-23 03:17:48.028436 7f09077fe700 20 REMOTE_USER=
> 2016-06-23 03:17:48.028437 7f09077fe700 20 REQUEST_METHOD=POST
> 2016-06-23 03:17:48.028438 7f09077fe700 20 REQUEST_URI=/swift/v1/pub5
> 2016-06-23 03:17:48.028439 7f09077fe700 20 SCRIPT_URI=/swift/v1/pub5
> 2016-06-23 03:17:48.028439 7f09077fe700 20 SERVER_PORT=7480
> 2016-06-23 03:17:48.028442 7f09077fe700  1 == starting new request
> req=0x7f09077f87d0 =
> 2016-06-23 03:17:48.028470 7f09077fe700  2 req 63:0.29::POST
> /swift/v1/pub5::initializing for trans_id =
> tx0003f-00576b8d1c-16d30b-default
> 2016-06-23 03:17:48.028478 7f09077fe700 10 host=10.10.20.9
> 2016-06-23 03:17:48.028482 7f09077fe700 20 subdomain= domain=
> in_hosted_domain=0 in_hosted_domain_s3website=0
> 2016-06-23 03:17:48.028494 7f09077fe700 10 meta>> HTTP_X_CONTAINER_READ
> 2016-06-23 03:17:48.028501 7f09077fe700 10 x>> x-amz-read:*
> 2016-06-23 03:17:48.028520 7f09077fe700 10 ver=v1 first=pub5 req=
> 2016-06-23 03:17:48.028527 7f09077fe700 10
> handler=28RGWHandler_REST_Bucket_SWIFT
> 2016-06-23 03:17:48.028530 7f09077fe700  2 req 63:0.89:swift:POST
> /swift/v1/pub5::getting op 4
> 2016-06-23 03:17:48.028535 7f09077fe700 10
> op=35RGWPutMetadataBucket_ObjStore_SWIFT
> 2016-06-23 03:17:48.028537 7f09077fe700  2 req 63:0.95:swift:POST
> /swift/v1/pub5:put_bucket_metadata:authorizing
> 2016-06-23 03:17:48.028544 7f09077fe700 20
> token_id=5b83a5faf86e4df3baa087049e8a0b9a
> 2016-06-23 03:17:48.028553 7f09077fe700 20 cached token.project.id
> =1c1ae7b02eaa4610bd46d04ddc0f3c00
> 2016-06-23 03:17:48.028559 7f09077fe700 20 updating
> user=1c1ae7b02eaa4610bd46d04ddc0f3c00
> 2016-06-23 03:17:48.028577 7f09077fe700 20 get_system_obj_state:
> rctx=0x7f09077f71d0
> obj=default.rgw.users.uid:1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> state=0x7f08f800c318 s->prefetch_data=0
> 2016-06-23 03:17:48.028589 7f09077fe700 10 cache get:
> name=default.rgw.users.uid+1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> : type miss (requested=6, cached=0)
> 2016-06-23 03:17:48.029626 7f09077fe700 10 cache put:
> name=default.rgw.users.uid+1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> info.flags=0
> 2016-06-23 03:17:48.029638 7f09077fe700 10 moving
> default.rgw.users.uid+1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> t

[ceph-users] CephFS mds cache pressure

2016-06-28 Thread João Castro
Hello guys,
>From time to time I have MDS cache pressure error (Client failing to respond 
to cache pressure).
If I try to increase the mds_cache_size to increase the number of inodes two 
things happen:

1) inodes keep growing until I get to the limit again
2) the more inodes, mds runs out of memory and gets killed by the OS, swaps 
to the failover mds.

I have two mds, each with 12gb of ram. Running on a VM. 

root@mds01:~# ceph daemon mds.mds01 perf dump | head -n 30
{
"mds": {
"request": 44328985,
"reply": 44328983,
"reply_latency": {
"avgcount": 44328983,
"sum": 155060.386784153
},
"forward": 0,
"dir_fetch": 52149,
"dir_commit": 7306,
"dir_split": 0,
"inode_max": 300,
"inodes": 2884302,
"inodes_top": 1191759,
"inodes_bottom": 485982,
"inodes_pin_tail": 1206561,
"inodes_pinned": 2398701,
"inodes_expired": 187542007,
"inodes_with_caps": 2395109,
"caps": 2395110,
"subtrees": 2,
"traverse": 44376013,
"traverse_hit": 44305134,
"traverse_forward": 0,
"traverse_discover": 0,
"traverse_dir_fetch": 22494,
"traverse_remote_ino": 0,
"traverse_lock": 0,
"load_cent": 4434414393,


Any idea how can I overcome this problem?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS mds cache pressure

2016-06-28 Thread John Spray
On Tue, Jun 28, 2016 at 10:25 PM, João Castro  wrote:
> Hello guys,
> From time to time I have MDS cache pressure error (Client failing to respond
> to cache pressure).
>
> If I try to increase the mds_cache_size to increase the number of inodes two
> things happen:
>
> 1) inodes keep growing until I get to the limit again
> 2) the more inodes, mds runs out of memory and gets killed by the OS, swaps
> to the failover mds.

The thing to do is to investigate why the client seems to be
misbehaving.  Is it a kernel client or ceph-fuse?  And what version?

John

> I have two mds, each with 12gb of ram. Running on a VM.
>
> root@mds01:~# ceph daemon mds.mds01 perf dump | head -n 30
> {
> "mds": {
> "request": 44328985,
> "reply": 44328983,
> "reply_latency": {
> "avgcount": 44328983,
> "sum": 155060.386784153
> },
> "forward": 0,
> "dir_fetch": 52149,
> "dir_commit": 7306,
> "dir_split": 0,
> "inode_max": 300,
> "inodes": 2884302,
> "inodes_top": 1191759,
> "inodes_bottom": 485982,
> "inodes_pin_tail": 1206561,
> "inodes_pinned": 2398701,
> "inodes_expired": 187542007,
> "inodes_with_caps": 2395109,
> "caps": 2395110,
> "subtrees": 2,
> "traverse": 44376013,
> "traverse_hit": 44305134,
> "traverse_forward": 0,
> "traverse_discover": 0,
> "traverse_dir_fetch": 22494,
> "traverse_remote_ino": 0,
> "traverse_lock": 0,
> "load_cent": 4434414393,
>
>
> Any idea how can I overcome this problem?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS mds cache pressure

2016-06-28 Thread João Castro
Hey John,

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
4.2.0-36-generic

Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS mds cache pressure

2016-06-28 Thread João Castro
Sorry, forgot.
Kernel!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Brendan Moloney
The Ubuntu bug report is here: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729

> Please be aware that you may need even more patches. Overall this needs 3 
> patches. Where the first two try to fix a bug and the 3rd one fixes the fixes 
> + even more bugs related to the scheduler. I've no idea on which patch level 
> Ubuntu is.
> 
> Stefan
> 
> Excuse my typo sent from my mobile phone.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CPU use for OSD daemon

2016-06-28 Thread Christian Balzer

Hello,

re-adding list.

On Tue, 28 Jun 2016 20:52:51 +0300 George Shuklin wrote:

> On 06/28/2016 06:46 PM, Christian Balzer wrote:
> > Hello,
> >
> > On Tue, 28 Jun 2016 18:23:02 +0300 George Shuklin wrote:
> >
> >> Hello.
> >>
> >> I'm testing different configuration for Ceph.
> > What version...
> jewel.
>
That should be pretty fast by itself, after that the optimizations
Alexandre mentioned. 
 
> >
> >> I found that osd are
> >> REALLY hungry for cpu.
> >>
> > They can be, but unlikely in your case.
> >
> >> I've created a tiny pool with size 1 with single OSD made of fast
> >> intel SSD (2500-series), on old dell server (R210),  Xeon E3-1230 V2
> >> @ 3.30GHz.
> >>
> > At a replication size of 1, a totally unrealistic test scenario.
> >
> > Ignoring that, an Intel SSD PRO 2500 is a consumer SSD and as such with
> > near certainty ill suited for usage with Ceph, especially when it
> > comes to journals.
> > Check/google the countless threads about what constitutes SSDs
> > suitable for Ceph usage.
> 
> I understand that, but the point is that it was stuck at cpu, not IO on 
> SSD (disk utilization was < 5% according to atop).
> 
That makes little to no sense.

> >> I see some horribly-low performance and clear
> >> bottleneck at ceph-osd process: it consumes about 110% of CPU and
> >> giving
> > 110% actual CPU usage?
> > I'd wager a significant amount of that is IOWAIT...
> No, it was clear computation, not IO.
> 
> It was somehow badly created OSD. I've recreated it
Any details on that?
So people in the future searching for a problem like this can avoid it.

>, and now I'm hitting 
> limits of SSD performance with ~900 IOPS (with 99% utilization of SSD 
> and 23% utilization of CPU by ceph-osd).
> 
That ratio and performance sounds more like it, given your SSD model.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS mds cache pressure

2016-06-28 Thread xiaoxi chen
Hmm, I asked in the ML some days before,:) likely you hit the kernel bug which 
fixed by commit 5e804ac482 "ceph: don't invalidate page cache when inode is no 
longer used”.  This fix is in 4.4 but not in 4.2. I haven't got a chance to 
play with 4.4 , it would be great if you can have a try.
For MDS OOM issue, we did a MDS RSS vs #Inodes scaling test, the result showing 
around 4MB per 1000 Inodes, so your MDS likely can hold up to 2~3 Million 
inodes. But yes, even with the fix if the client misbehavior (open and hold a 
lot of inodes, doesn't respond to cache pressure message), MDS can go over the 
throttling and then killed by OOM

> To: ceph-users@lists.ceph.com
> From: castrofj...@gmail.com
> Date: Tue, 28 Jun 2016 21:34:03 +
> Subject: Re: [ceph-users] CephFS mds cache pressure
> 
> Hey John,
> 
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> 4.2.0-36-generic
> 
> Thanks!
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FIO Performance test

2016-06-28 Thread Mohd Zainal Abidin Rabani
Hi,

 

We have manage deploy ceph with cloudstack. Now, we running 3 monitor and 5
osd. We share some output and we very proud get done ceph. We will move ceph
to production on short period. We have manage to build VSM (GUI) to monitor
ceph.

 

Result:

 

This test using one vm only. The result is good. 

 

ceph-1

 

test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64

fio-2.0.13

Starting 1 process

Jobs: 1 (f=1): [m] [100.0% done] [9078K/9398K/0K /s] [2269 /2349 /0  iops]
[eta 00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=1167: Tue Jun 28 21:26:28 2016

  read : io=262184KB, bw=10323KB/s, iops=2580 , runt= 25399msec

  write: io=262104KB, bw=10319KB/s, iops=2579 , runt= 25399msec

  cpu  : usr=4.30%, sys=23.89%, ctx=69266, majf=0, minf=20

  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%

 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%

 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%

 issued: total=r=65546/w=65526/d=0, short=r=0/w=0/d=0

 

Run status group 0 (all jobs):

   READ: io=262184KB, aggrb=10322KB/s, minb=10322KB/s, maxb=10322KB/s,
mint=25399msec, maxt=25399msec

  WRITE: io=262104KB, aggrb=10319KB/s, minb=10319KB/s, maxb=10319KB/s,
mint=25399msec, maxt=25399msec

 

Disk stats (read/write):

dm-0: ios=65365/65345, merge=0/0, ticks=501897/1094751,
in_queue=1598532, util=99.75%, aggrios=65546/65542, aggrmerge=0/1,
aggrticks=508542/1102418, aggrin_queue=1610856, aggrutil=99.70%

  vda: ios=65546/65542, merge=0/1, ticks=508542/1102418, in_queue=1610856,
util=99.70%

 

test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64

fio-2.0.13

Starting 1 process

test: Laying out IO file(s) (1 file(s) / 512MB)

Jobs: 1 (f=1): [r] [100.0% done] [58279K/0K/0K /s] [14.6K/0 /0  iops] [eta
00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=1174: Tue Jun 28 21:31:25 2016

  read : io=524288KB, bw=60992KB/s, iops=15248 , runt=  8596msec

  cpu  : usr=9.59%, sys=49.33%, ctx=88437, majf=0, minf=83

  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%

 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%

 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%

 issued: total=r=131072/w=0/d=0, short=r=0/w=0/d=0

 

Run status group 0 (all jobs):

   READ: io=524288KB, aggrb=60992KB/s, minb=60992KB/s, maxb=60992KB/s,
mint=8596msec, maxt=8596msec

 

Disk stats (read/write):

dm-0: ios=128588/3, merge=0/0, ticks=530897/81, in_queue=531587,
util=98.88%, aggrios=131072/4, aggrmerge=0/0, aggrticks=542615/81,
aggrin_queue=542605, aggrutil=98.64%

  vda: ios=131072/4, merge=0/0, ticks=542615/81, in_queue=542605,
util=98.64%

 

test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64

fio-2.0.13

Starting 1 process

test: Laying out IO file(s) (1 file(s) / 512MB)

Jobs: 1 (f=1): [w] [100.0% done] [0K/2801K/0K /s] [0 /700 /0  iops] [eta
00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=1178: Tue Jun 28 21:36:43 2016

  write: io=524288KB, bw=7749.4KB/s, iops=1937 , runt= 67656msec

  cpu  : usr=2.20%, sys=14.58%, ctx=51767, majf=0, minf=19

  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%

 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%

 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%

 issued: total=r=0/w=131072/d=0, short=r=0/w=0/d=0

 

Run status group 0 (all jobs):

  WRITE: io=524288KB, aggrb=7749KB/s, minb=7749KB/s, maxb=7749KB/s,
mint=67656msec, maxt=67656msec

 

Disk stats (read/write):

dm-0: ios=0/134525, merge=0/0, ticks=0/4563062, in_queue=4575253,
util=100.00%, aggrios=0/131235, aggrmerge=0/3303, aggrticks=0/4276064,
aggrin_queue=4275879, aggrutil=99.99%

  vda: ios=0/131235, merge=0/3303, ticks=0/4276064, in_queue=4275879,
util=99.99%

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Alex Gorbachev
Hi Stefan,

On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
 wrote:
> Please be aware that you may need even more patches. Overall this needs 3
> patches. Where the first two try to fix a bug and the 3rd one fixes the
> fixes + even more bugs related to the scheduler. I've no idea on which patch
> level Ubuntu is.

Stefan, would you be able to please point to the other two patches
beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?

Thank you,
Alex

>
> Stefan
>
> Excuse my typo sent from my mobile phone.
>
> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>
> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
> useful information to add other than it's not just you.
>
> Tim.
>
> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>
> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>
> these issues where an OSD would fail with the stack below.  I logged a
>
> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>
> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>
> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>
> just the noop scheduler.
>
>
> Does the ceph kernel code somehow use the fair scheduler code block?
>
>
> Thanks
>
> --
>
> Alex Gorbachev
>
> Storcium
>
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>
> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>
> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>
> 03/04/2015
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>
> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>
> 0010:[]  []
>
> task_numa_find_cpu+0x22e/0x6f0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>
> 0018:880f79fbb818  EFLAGS: 00010206
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>
>  RBX: 880f79fbb8b8 RCX: 
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>
>  RSI:  RDI: 8810352d4800
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>
> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>
> 0009 R11: 0006 R12: 8807c3adc4c0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>
> 0006 R14: 033e R15: fec7
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>
> 7f30e46b8700() GS:88105f58()
>
> knlGS:
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>
>  ES:  CR0: 80050033
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>
> 1321a000 CR3: 000853598000 CR4: 000406e0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>
> 813d050f 000d 0045 880f79df8000
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>
> 033f  00016b00 033f
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>
> 880f79df8000 880f79fbb8b8 01f4 0054
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
>
> [] ? cpumask_next_and+0x2f/0x40
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
>
> [] task_numa_migrate+0x43e/0x9b0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
>
> [] ? update_cfs_shares+0xbc/0x100
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
>
> [] numa_migrate_preferred+0x79/0x80
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
>
> [] task_numa_fault+0x7f4/0xd40
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
>
> [] ? timerqueue_del+0x24/0x70
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
>
> [] ? should_numa_migrate_memory+0x55/0x130
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
>
> [] handle_mm_fault+0xbc0/0x1820
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
>
> [] ? __hrtimer_init+0x90/0x90
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
>
> [] ? remove_wait_queue+0x4d/0x60
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
>
> [] ? poll_freewait+0x4a/0xa0
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
>
> [] __do_page_fault+0x197/0x400
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
>
> [] do_page_fault+0x22/0x30
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
>
> [] page_fault+0x28/0x30
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
>
> [] ? copy_page_to_iter_iovec+0x5f/0x300
>
> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
>
> [] ? select_t

[ceph-users] Ceph-deploy new OSD addition issue

2016-06-28 Thread Pisal, Ranjit Dnyaneshwar
Hi,

I am stuck at one point to new OSD Host to existing ceph cluster. I tried a 
multiple combinations for creating OSDs on new host but every time its failing 
while disk activation and no partition for OSD (/var/lib/ceph/osd/ceoh-xxx) is 
getting created instead (/var/lib/ceph/tmp/bhbjnk.mnt) temp partition is 
created. The host I have is combination of SSD and SAS disks. SSDs are parted 
to use for Journaling purpose. The sequence I tried to add the new host as 
follows -

1. Ceph-rpms installed on new Host
2. from INIT node - ceph-disk list for new host checked
3. Prepared disk - ceph-deploy --overwrite-conf osd create --fs-type xfs {OSD 
node}:{raw device}, - Result showed that Host is ready for OSD use; however it 
didn't reflect in OSD tree (Because crush was not updated (?) ) neither 
/var/lib/OSD.xx mount got created.
4. Although it showed Host ready for OSD use; before it threw a warning that 
disconnecting after 300 seconds as no data received from new Host
5.I tried to activate the disk manually - a. sudo ceph-disk activate /dev/sde1 -
This command failed to execute with erroneous values
ceph-disk: Cannot discover filesystem type: device /dev/sda: Line is truncated

After this I also tried to install ceph-deploy and prepare new host using below 
commands and repeated above steps but it still failed at same point of disk 
activation.

ceph-deploy install new Host
ceph-deploy new newHost

Attached logs for reference.

Please assist with any known workaround/resolution.

Thanks
Ranjit
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS mds cache pressure

2016-06-28 Thread Mykola Dvornik
I have the same issues with the variety of kernel clients running 4.6.3
and 4.4.12 and fuse clients from 10.2.2.

-Mykola

-Original Message-
From: xiaoxi chen 
To: João Castro , ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] CephFS mds cache pressure
Date: Wed, 29 Jun 2016 01:00:40 +

Hmm, I asked in the ML some days before,:) likely you hit the kernel
bug which fixed by commit 5e804ac482 "ceph: don't invalidate page cache
when inode is no longer used”.  This fix is in 4.4 but not in 4.2. I
haven't got a chance to play with 4.4 , it would be great if you can
have a try.

For MDS OOM issue, we did a MDS RSS vs #Inodes scaling test, the result
showing around 4MB per 1000 Inodes, so your MDS likely can hold up to
2~3 Million inodes. But yes, even with the fix if the client
misbehavior (open and hold a lot of inodes, doesn't respond to cache
pressure message), MDS can go over the throttling and then killed by
OOM


> To: ceph-users@lists.ceph.com
> From: castrofj...@gmail.com
> Date: Tue, 28 Jun 2016 21:34:03 +
> Subject: Re: [ceph-users] CephFS mds cache pressure
> 
> Hey John,
> 
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> 4.2.0-36-generic
> 
> Thanks!
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy new OSD addition issue

2016-06-28 Thread Pisal, Ranjit Dnyaneshwar
This is another error I get while trying to activate disk -

[ceph@MYOPTPDN16 ~]$ sudo ceph-disk activate /dev/sdl1
2016-06-29 11:25:17.436256 7f8ed85ef700  0 -- :/1032777 >> 10.115.1.156:6789/0 
pipe(0x7f8ed4021610 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ed40218a0).fault
2016-06-29 11:25:20.436362 7f8ed84ee700  0 -- :/1032777 >> 10.115.1.156:6789/0 
pipe(0x7f8ec4000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ec4000e90).fault
^Z
[2]+  Stopped sudo ceph-disk activate /dev/sdl1

Best Regards,
Ranjit
+91-9823240750


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Pisal, 
Ranjit Dnyaneshwar
Sent: Wednesday, June 29, 2016 10:59 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph-deploy new OSD addition issue


Hi,

I am stuck at one point to new OSD Host to existing ceph cluster. I tried a 
multiple combinations for creating OSDs on new host but every time its failing 
while disk activation and no partition for OSD (/var/lib/ceph/osd/ceoh-xxx) is 
getting created instead (/var/lib/ceph/tmp/bhbjnk.mnt) temp partition is 
created. The host I have is combination of SSD and SAS disks. SSDs are parted 
to use for Journaling purpose. The sequence I tried to add the new host as 
follows -

1. Ceph-rpms installed on new Host
2. from INIT node - ceph-disk list for new host checked
3. Prepared disk - ceph-deploy --overwrite-conf osd create --fs-type xfs {OSD 
node}:{raw device}, - Result showed that Host is ready for OSD use; however it 
didn't reflect in OSD tree (Because crush was not updated (?) ) neither 
/var/lib/OSD.xx mount got created.
4. Although it showed Host ready for OSD use; before it threw a warning that 
disconnecting after 300 seconds as no data received from new Host
5.I tried to activate the disk manually - a. sudo ceph-disk activate /dev/sde1 -
This command failed to execute with erroneous values
ceph-disk: Cannot discover filesystem type: device /dev/sda: Line is truncated

After this I also tried to install ceph-deploy and prepare new host using below 
commands and repeated above steps but it still failed at same point of disk 
activation.

ceph-deploy install new Host
ceph-deploy new newHost

Attached logs for reference.

Please assist with any known workaround/resolution.

Thanks
Ranjit
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-28 Thread Mario Giammarco
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
stripe_width 0
   removed_snaps [1~3]
pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
stripe_width 0
   removed_snaps [1~3]
pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
stripe_width 0
   removed_snaps [1~3]


ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
5 1.81000  1.0  1857G  984G  872G 53.00 0.86
6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
3 1.35999  1.0  1391G  906G  485G 65.12 1.06
4 0.8  1.0   926G  702G  223G 75.88 1.23
7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
9 0.8  1.0   926G  573G  352G 61.91 1.01
0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
13 0.45000  1.0   460G  307G  153G 66.74 1.08
 TOTAL 14846G 9136G 5710G 61.54
MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47



ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

http://pastebin.com/SvGfcSHb
http://pastebin.com/gYFatsNS
http://pastebin.com/VZD7j2vN

I do not understand why I/O on ENTIRE cluster is blocked when only few pgs
are incomplete.

Many thanks,
Mario


Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> ha scritto:

> And ceph health detail
>
> Stefan
>
> Excuse my typo sent from my mobile phone.
>
> Am 28.06.2016 um 19:28 schrieb Oliver Dzombic :
>
> Hi Mario,
>
> please give some more details:
>
> Please the output of:
>
> ceph osd pool ls detail
> ceph osd df
> ceph --version
>
> ceph -w for 10 seconds ( use http://pastebin.com/ please )
>
> ceph osd crush dump ( also pastebin pls )
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de 
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>
> Hello,
>
> this is the second time that happens to me, I hope that someone can
>
> explain what I can do.
>
> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>
>
> One hdd goes down due to bad sectors.
>
> Ceph recovers but it ends with:
>
>
> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>
> health HEALTH_WARN
>
>3 pgs down
>
>19 pgs incomplete
>
>19 pgs stuck inactive
>
>19 pgs stuck unclean
>
>7 requests are blocked > 32 sec
>
> monmap e11: 7 mons at
>
> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
>
> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
>
> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
>
>election epoch 722, quorum
>
> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>
> osdmap e10182: 10 osds: 10 up, 10 in
>
>  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
>
>9136 GB used, 5710 GB / 14846 GB avail
>
>1005 active+clean
>
>  16 incomplete
>
>   3 down+incomplete
>
>
> Unfortunately "7 requests blocked" means no virtual machine can boot
>
> because ceph has stopped i/o.
>
>
> I can accept to lose some data, but not ALL data!
>
> Can you help me please?
>
> Thanks,
>
> Mario
>
>
> ___
>
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-06-28 Thread Stefan Priebe - Profihost AG

Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
> Hi Stefan,
> 
> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Please be aware that you may need even more patches. Overall this needs 3
>> patches. Where the first two try to fix a bug and the 3rd one fixes the
>> fixes + even more bugs related to the scheduler. I've no idea on which patch
>> level Ubuntu is.
> 
> Stefan, would you be able to please point to the other two patches
> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?

Sorry sure yes:

1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
bounded value")

2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
post_init_entity_util_avg() serialization")

3.) the one listed at lkml.

Stefan

> 
> Thank you,
> Alex
> 
>>
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>>
>> Yes - I noticed this today on Ubuntu 16.04 with the default kernel. No
>> useful information to add other than it's not just you.
>>
>> Tim.
>>
>> On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>>
>> After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>>
>> these issues where an OSD would fail with the stack below.  I logged a
>>
>> bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there is
>>
>> a similar description at https://lkml.org/lkml/2016/6/22/102, but the
>>
>> odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>>
>> just the noop scheduler.
>>
>>
>> Does the ceph kernel code somehow use the fair scheduler code block?
>>
>>
>> Thanks
>>
>> --
>>
>> Alex Gorbachev
>>
>> Storcium
>>
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>>
>> 10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic #201606072354
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>>
>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>>
>> 03/04/2015
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685009] task:
>>
>> 880f79df8000 ti: 880f79fb8000 task.ti: 880f79fb8000
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685024] RIP:
>>
>> 0010:[]  []
>>
>> task_numa_find_cpu+0x22e/0x6f0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685051] RSP:
>>
>> 0018:880f79fbb818  EFLAGS: 00010206
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685063] RAX:
>>
>>  RBX: 880f79fbb8b8 RCX: 
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685076] RDX:
>>
>>  RSI:  RDI: 8810352d4800
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685107] RBP:
>>
>> 880f79fbb880 R08: 0001020cf87c R09: 00ff00ff
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685150] R10:
>>
>> 0009 R11: 0006 R12: 8807c3adc4c0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685194] R13:
>>
>> 0006 R14: 033e R15: fec7
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685238] FS:
>>
>> 7f30e46b8700() GS:88105f58()
>>
>> knlGS:
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685283] CS:  0010 DS:
>>
>>  ES:  CR0: 80050033
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685310] CR2:
>>
>> 1321a000 CR3: 000853598000 CR4: 000406e0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685354] Stack:
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685374]
>>
>> 813d050f 000d 0045 880f79df8000
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685426]
>>
>> 033f  00016b00 033f
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685477]
>>
>> 880f79df8000 880f79fbb8b8 01f4 0054
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685528] Call Trace:
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.68]
>>
>> [] ? cpumask_next_and+0x2f/0x40
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685584]
>>
>> [] task_numa_migrate+0x43e/0x9b0
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685613]
>>
>> [] ? update_cfs_shares+0xbc/0x100
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685642]
>>
>> [] numa_migrate_preferred+0x79/0x80
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685672]
>>
>> [] task_numa_fault+0x7f4/0xd40
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685700]
>>
>> [] ? timerqueue_del+0x24/0x70
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685729]
>>
>> [] ? should_numa_migrate_memory+0x55/0x130
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685762]
>>
>> [] handle_mm_fault+0xbc0/0x1820
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685793]
>>
>> [] ? __hrtimer_init+0x90/0x90
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685822]
>>
>> [] ? remove_wait_queue+0x4d/0x60
>>
>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
>>
>