date:20161019

[ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Yoann Moulin

Dear List, We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose by 12 nodes, each nodes have 10 OSD with journal on disk. We have one rbd partition and a radosGW with 2 data pool, one replicated, one EC (8+2) in attachment few details on our cluster. Currently, our clu

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Dan van der Ster

Hi Yoann, On Wed, Oct 19, 2016 at 9:44 AM, Yoann Moulin wrote: > Dear List, > > We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose > by 12 nodes, each nodes have 10 OSD with journal on disk. > > We have one rbd partition and a radosGW with 2 data pool, one replicated,

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Christian Balzer

Hello, no specific ideas, but this somewhat sounds familiar. One thing first, you already stopped client traffic but to make sure your cluster really becomes quiescent, stop all scrubs as well. That's always a good idea in any recovery, overload situation. Have you verified CPU load (are those

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Burkhard Linke

Hi, just an additional comment: you can disable backfilling and recovery temporarily by setting the 'nobackfill' and 'norecover' flags. It will reduce the backfilling traffic and may help the cluster and its OSD to recover. Afterwards you should set the backfill traffic settings to the minimu

Re: [ceph-users] offending shards are crashing osd's

2016-10-19 Thread Ronny Aasen

On 06. okt. 2016 13:41, Ronny Aasen wrote: hello I have a few osd's in my cluster that are regularly crashing. [snip] ofcourse having 3 osd's dying regularly is not good for my health. so i have set noout, to avoid heavy recoveries. googeling this error messages gives exactly 1 hit: https:

[ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Jim Kilborn

I have setup a new linux cluster to allow migration from our old SAN based cluster to a new cluster with ceph. All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. I am basically running stock ceph settings, with just turning the write cache off via hdparm on the drives, and temporaril

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Yoann Moulin

Hello, >> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose >> by 12 nodes, each nodes have 10 OSD with journal on disk. >> >> We have one rbd partition and a radosGW with 2 data pool, one replicated, >> one EC (8+2) >> >> in attachment few details on our cluster. >> >

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Dan van der Ster

On Wed, Oct 19, 2016 at 3:22 PM, Yoann Moulin wrote: > Hello, > >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool, one replicated, >>> one

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Yoann Moulin

Hello, >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool, one replicated, >>> one EC (8+2) >>> >>> in attachment few details on our cluste

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread John Spray

On Wed, Oct 19, 2016 at 1:28 PM, Jim Kilborn wrote: > I have setup a new linux cluster to allow migration from our old SAN based > cluster to a new cluster with ceph. > All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. > I am basically running stock ceph settings, with just turning

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-19 Thread Tyler Bishop

This is a cool project, keep up the good work! _ Tyler Bishop Founder O: 513-299-7108 x10 M: 513-646-5809 http://BeyondHosting.net This email is intended only for the recipient(s) above and/or otherwise authorized personnel. The information

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-19 Thread Scottix

I would take the analogy of a Raid scenario. Basically a standby is considered like a spare drive. If that spare drive goes down. It is good to know about the event, but it does in no way indicate a degraded system, everything keeps running at top speed. If you had multi active MDS and one goes do

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-19 Thread Sean Redmond

Hi, I would be interested in this case when a mds in standby-replay fails. Thanks On Wed, Oct 19, 2016 at 4:06 PM, Scottix wrote: > I would take the analogy of a Raid scenario. Basically a standby is > considered like a spare drive. If that spare drive goes down. It is good to > know about the

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Jim Kilborn

John, Thanks for the tips…. Unfortunately, I was looking at this page http://docs.ceph.com/docs/jewel/start/os-recommendations/ I’ll consider either upgrading the kernels or using the fuse client, but will likely go the kernel 4.4 route As for moving to just a replicated pool, I take it t

[ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Ahmed Mostafa

Hello >From the documentation i understand that clients that uses librados must perform striping for themselves, but i do not understand how could this be if we have striping options in ceph ? i mean i can create rbd images that has configuration for striping, count and unite size. So my question

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread John Spray

On Wed, Oct 19, 2016 at 5:17 PM, Jim Kilborn wrote: > John, > > > > Thanks for the tips…. > > Unfortunately, I was looking at this page > http://docs.ceph.com/docs/jewel/start/os-recommendations/ OK, thanks - I've pushed an update to clarify that (https://github.com/ceph/ceph/pull/11564). > I’l

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cachepressure, capability release, poor iostat await avg queue size

2016-10-19 Thread mykola.dvornik

Not sure if related, but I see the same issue on the very different hardware/configuration. In particular on large data transfers OSDs become slow and blocking. Iostat await on spinners can go up to 6(!) s ( journal is on the ssd). Looking closer on those spinners with blktrace suggest that most

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Jim Kilborn

John, Updating to the latest mainline kernel from elrepo (4.8.2-1) on all 4 ceph servers, and the ceph client that I am testing with, still didn’t fix the issues. Still getting “Failing to respond to Cache Pressure”. And ops block currently hovering between 100-300 requests > 32 sec This

[ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-19 Thread Kostis Fardelas

Hello cephers, this is the blog post on our Ceph cluster's outage we experienced some weeks ago and about how we managed to revive the cluster and our clients's data. I hope it will prove useful for anyone who will find himself/herself in a similar position. Thanks for everyone on the ceph-users a

[ceph-users] removing image of rbd mirroring

2016-10-19 Thread yan cui

Hi all, We setup rbd mirroring between 2 clusters, but have issues when we want to delete one image. Following is the detailed info. It reports that some other instance is still using it, which kind of makes sense because we set up the mirror between 2 clusters. What's the best practice to rem

Re: [ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Jason Dillaman

librbd (used by QEMU to provide RBD-backed disks) uses librados and provides the necessary handling for striping across multiple backing objects. When you don't specify "fancy" striping options via "--stripe-count" and "--stripe-unit", it essentially defaults to stripe count of 1 and stripe unit of

Re: [ceph-users] removing image of rbd mirroring

2016-10-19 Thread Jason Dillaman

On Wed, Oct 19, 2016 at 6:52 PM, yan cui wrote: > 2016-10-19 15:46:44.843053 7f35c9925d80 -1 librbd: cannot obtain exclusive > lock - not removing Are you attempting to delete the primary or non-primary image? I would expect any attempts to delete the non-primary image to fail since the non-prima

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Christian Balzer

Hello, On Wed, 19 Oct 2016 12:28:28 + Jim Kilborn wrote: > I have setup a new linux cluster to allow migration from our old SAN based > cluster to a new cluster with ceph. > All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. As others mentioned, not a good choice, but also not

[ceph-users] When the kernel support JEWEL tunables?

2016-10-19 Thread 한승진

Hi all, When I try to mount rbd through KRBD, it failed because of mismatch features. The Client's OS is Ubuntu 16.04 and kernel is 4.4.0-38 My original CRUSH tunables is below. root@Fx2x1ctrlserv01:~# ceph osd crush show-tunables { "choose_local_tries": 0, "choose_local_fallback_tries"

Re: [ceph-users] When the kernel support JEWEL tunables?

2016-10-19 Thread Alexandre DERUMIER

works fine with kernel 4.6 for me. from doc: http://docs.ceph.com/docs/master/rados/operations/crush-map/#crush-tunables it should works with kernel 4.5 too. I don't known if they are any plan to backport last krbd module version to kernel 4.4 ? - Mail original - De: "한승진" À: "cep

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-19 Thread Goncalo Borges

Hi Kostis... That is a tale from the dark side. Glad you recover it and that you were willing to doc it all up, and share it. Thank you for that, Can I also ask which tool did you use to recover the leveldb? Cheers Goncalo From: ceph-users [ceph-users-boun.

Re: [ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Ahmed Mostafa

Does this also mean that strip count can be thought of as the number of parrallel writes to different objects at different OSDs ? Thank you On Thursday, 20 October 2016, Jason Dillaman wrote: > librbd (used by QEMU to provide RBD-backed disks) uses librados and > provides the necessary handling

[ceph-users] Source Package radosgw file has authentication issues

2016-10-19 Thread 于姜

ceph_10.2.3.orig.tar.gz Source package Compile completed： /root/neunn_gitlab/ceph-Jewel10.2.3/src/radosgw The following issues occur when script execution： 2016-10-20 11:36:30.102266 7f8b4b93f900 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/-admin/keyring: (2) No such file or dir

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-19 Thread Kostis Fardelas

We pulled leveldb from upstream and fired leveldb.RepairDB against the OSD omap directory using a simple python script. Ultimately, that didn't make things forward. We resorted to check every object's timestamp/md5sum/attributes on the crashed OSD against the replicas in the cluster and at last too

[ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

Re: [ceph-users] offending shards are crashing osd's

[ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

[ceph-users] qemu-rbd and ceph striping

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cachepressure, capability release, poor iostat await avg queue size

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

[ceph-users] Surviving a ceph cluster outage: the hard way

[ceph-users] removing image of rbd mirroring

Re: [ceph-users] qemu-rbd and ceph striping

Re: [ceph-users] removing image of rbd mirroring

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

[ceph-users] When the kernel support JEWEL tunables?

Re: [ceph-users] When the kernel support JEWEL tunables?

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

Re: [ceph-users] qemu-rbd and ceph striping

[ceph-users] Source Package radosgw file has authentication issues

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

29 matches

Site Navigation

Mail list logo

Footer information