[ceph-users] Pipe "deadlock" in Hammer, 0.94.5

2017-01-12 Thread 许雪寒
Hi, everyone. Recently, we did some experiment to test the stability of the ceph cluster. We used Hammer version which is the mostly used version of online cluster. One of the scenarios that we simulated is poor network connectivity, in which we used iptables to drop TCP/IP packet under some pr

[ceph-users] Using hammer version, is radosgw supporting fastcgi long connection?

2017-01-12 Thread ??????
Hi everybody, I am using hammer version. In this version, is radosgw supporting fastcgi long connection? The fastcgi long connection configuration for nginx is fastcgi_keep_conn on. Best wishes, yaozongyou 2017/1/12___ ceph-users mailing list ceph-us

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Piotr Dałek
On 01/11/2017 07:01 PM, Sage Weil wrote: On Wed, 11 Jan 2017, Jason Dillaman wrote: On Wed, Jan 11, 2017 at 11:44 AM, Piotr Dałek wrote: As the subject says - are here any users/consumers of librados C API? I'm asking because we're researching if this PR: https://github.com/ceph/ceph/pull/1221

Re: [ceph-users] CephFS Path Restriction, can still read all files

2017-01-12 Thread Boris Mattijssen
John, Do you know which kernel version I need? It seems to be not working with 4.8.15 on coreos (4.8.15-coreos) (I also tested on 4.7.3). I can confirm that it works using the ceph-fuse client, but I need the kernel client to work since I want to mount using Kubernetes ;) Btw, this is the error I

[ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread ulembke
Hi all, I had just reboot all 3 nodes (one after one) of an small Proxmox-VE ceph-cluster. All nodes are mons and have two OSDs. During reboot of one node, ceph stucks longer than normaly and I look in the "ceph -w" output to find the reason. This is not the reason, but I'm wonder why "osd mar

Re: [ceph-users] Pipe "deadlock" in Hammer, 0.94.5

2017-01-12 Thread jiajia zhong
if errno is EAGAIN for recv, the Pipe:do_recv just acts as blocked. so 2017-01-12 16:34 GMT+08:00 许雪寒 : > Hi, everyone. > > Recently, we did some experiment to test the stability of the ceph > cluster. We used Hammer version which is the mostly used version of online > cluster. One of the scenari

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Shinobu Kinjo
Sorry, I don't get your question. Generally speaking, the MON maintains maps of the cluster state: * Monitor map * OSD map * PG map * CRUSH map Regards, On Thu, Jan 12, 2017 at 7:03 PM, wrote: > Hi all, > I had just reboot all 3 nodes (one after one) of an small Proxmox-VE > ceph-cluster

Re: [ceph-users] slow requests break performance

2017-01-12 Thread Eugen Block
Hi, Looking at the output of dump_historic_ops and dump_ops_in_flight I waited for new slow request messages and dumped the historic_ops into a file. The reporting OSD shows lots of "waiting for rw locks" messages and a duration of more than 30 secs: "age": 366.044746,

Re: [ceph-users] CephFS Path Restriction, can still read all files

2017-01-12 Thread John Spray
On Thu, Jan 12, 2017 at 9:27 AM, Boris Mattijssen wrote: > John, > > Do you know which kernel version I need? It seems to be not working with > 4.8.15 on coreos (4.8.15-coreos) (I also tested on 4.7.3). > I can confirm that it works using the ceph-fuse client, but I need the > kernel client to wor

Re: [ceph-users] bluestore activation error on Ubuntu Xenial/Ceph Jewel

2017-01-12 Thread Peter Maloney
Hey there... resurrecting a dead apparently unanswered question. I had issues with this, and nobody online had any answers, and I accidentally ran into the solution. So I hope this helps someone. > Hello, > > I have been trying to deploy bluestore OSDs in a test cluster of 2x OSDs > and 3x mon (xe

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread ulembke
Hi, Am 2017-01-12 11:38, schrieb Shinobu Kinjo: Sorry, I don't get your question. Generally speaking, the MON maintains maps of the cluster state: * Monitor map * OSD map * PG map * CRUSH map yes - and if an osd say "osd.5 marked itself down" the mon can update immediately the OSD map (an

[ceph-users] PGs of EC pool stuck in peering state

2017-01-12 Thread george.vasilakakos
Hi Ceph folks, I’ve just posted a bug report http://tracker.ceph.com/issues/18508 I have a cluster (Jewel 10.2.3, SL7) that has trouble creating PGs in EC pools. Essentially, I’ll get a lot of CRUSH_ITEM_NONE (2147483647) in there and PGs will stay in peering states. This sometimes affects oth

[ceph-users] Ceph Network question

2017-01-12 Thread Sivaram Kannan
Hi, CEPH first time user here. I am trying to setup a ceph cluster. The documentation (http://ceph.com/planet/bootstrap-your-ceph-cluster-in-docker/) recommends a seperate network for control and data plane 1. Can i configure both control plane in the same network? 2. Is it so bad to configure th

Re: [ceph-users] Write back cache removal

2017-01-12 Thread Wido den Hollander
> Op 10 januari 2017 om 22:05 schreef Nick Fisk : > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Stuart Harland > Sent: 10 January 2017 11:58 > To: Wido den Hollander > Cc: ceph new ; n...@fisk.me.uk > Subject: Re: [ceph-users] Write back cache removal > >

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Sage Weil
On Thu, 12 Jan 2017, Piotr Dałek wrote: > On 01/11/2017 07:01 PM, Sage Weil wrote: > > On Wed, 11 Jan 2017, Jason Dillaman wrote: > > > On Wed, Jan 11, 2017 at 11:44 AM, Piotr Dałek > > > wrote: > > > > As the subject says - are here any users/consumers of librados C API? > > > > I'm > > > > askin

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Jason Dillaman
There is option (3) which is to have a new (or modified) "buffer::create_static" take an optional callback to invoke when the buffer::raw object is destructed. The raw pointer would be destructed when the last buffer::ptr / buffer::list containing it is destructed, so you know it's no longer being

Re: [ceph-users] PGs of EC pool stuck in peering state

2017-01-12 Thread Wido den Hollander
> Op 12 januari 2017 om 12:37 schreef george.vasilaka...@stfc.ac.uk: > > > Hi Ceph folks, > > I’ve just posted a bug report http://tracker.ceph.com/issues/18508 > So I debugged this a bit with George and after switching from async messenger back to simple messenger the problems are gone. S

[ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-12 Thread Nick Fisk
Hi, I had been testing some higher values with the osd_snap_trim_sleep variable to try and reduce the impact of removing RBD snapshots on our cluster and I have come across what I believe to be a possible unintended consequence. The value of the sleep seems to keep the lock on the PG open so tha

Re: [ceph-users] Ceph Network question

2017-01-12 Thread John Petrini
If you want to follow the recommended method and separate public and cluster traffic than they need to be two separate subnets as that is how they are defined in the config file. For example: public_network = 192.168.1.0/24 cluster_network = 10.1.9.0/24 I do recommend separating your public and c

Re: [ceph-users] Ceph Network question

2017-01-12 Thread Oliver Humpage
> I do recommend separating your public and cluster networks but there's not a > whole lot of benefit to it unless they are using physically separate links > with dedicated bandwidth. I thought a large part of it was security, in that it’s possible to DOS the cluster by disrupting intra-OSD tr

[ceph-users] HEALTH_OK when one server crashed?

2017-01-12 Thread Matthew Vernon
Hi, One of our ceph servers froze this morning (no idea why, alas). Ceph noticed, moved things around, and when I ran ceph -s, said: root@sto-1-1:~# ceph -s cluster 049fc780-8998-45a8-be12-d3b8b6f30e69 health HEALTH_OK monmap e2: 3 mons at {sto-1-1=172.27.6.11:6789/0,sto-2-1=172.27.

Re: [ceph-users] HEALTH_OK when one server crashed?

2017-01-12 Thread Wido den Hollander
> Op 12 januari 2017 om 15:35 schreef Matthew Vernon : > > > Hi, > > One of our ceph servers froze this morning (no idea why, alas). Ceph > noticed, moved things around, and when I ran ceph -s, said: > > root@sto-1-1:~# ceph -s > cluster 049fc780-8998-45a8-be12-d3b8b6f30e69 > health H

Re: [ceph-users] Ceph Network question

2017-01-12 Thread Sivaram Kannan
Hi, Thanks for the reply. The public network I am talking about is an isolated network with no access to internet, but lot of compute traffic though. If it is more about security, I would try setting up both in the same network. My worry is more towards any performance issues (due to re-balancing

[ceph-users] RBD key permission to unprotect a rbd snapshot

2017-01-12 Thread Martin Palma
Hi all, what permissions do I need to unprotect a protected rbd snapshot? Currently the key interacting with the pool containing the rbd image has the following permissions: mon 'allow r' osd 'allow rwx pool=vms' When I try to unprotect a snaphost with the following command "rbd snap unprotect

Re: [ceph-users] RBD key permission to unprotect a rbd snapshot

2017-01-12 Thread Jason Dillaman
The "rbd snap unprotect" action needs to scan the "rbd_children" object of all pools to ensure that the image doesn't have any children attached. Therefore, you need to ensure that the user that will perform the "snap unprotect" has the "allow class-read object_prefix rbd_children" on all pools [1]

Re: [ceph-users] RBD v1 image format ...

2017-01-12 Thread Shinobu Kinjo
It would be more appreciated to provide users with evaluation results of migration and recovery tools by QA to avoid any disaster on production environment, and get agreement with them e.g., #1 Scenarios we test #2 Images spec we use and some Does it make sense, or too much? Regards, On Th

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Gregory Farnum
On Thu, Jan 12, 2017 at 2:03 AM, wrote: > Hi all, > I had just reboot all 3 nodes (one after one) of an small Proxmox-VE > ceph-cluster. All nodes are mons and have two OSDs. > During reboot of one node, ceph stucks longer than normaly and I look in the > "ceph -w" output to find the reason. > >

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Samuel Just
How long did it take for the cluster to recover? -Sam On Thu, Jan 12, 2017 at 10:54 AM, Gregory Farnum wrote: > On Thu, Jan 12, 2017 at 2:03 AM, wrote: >> Hi all, >> I had just reboot all 3 nodes (one after one) of an small Proxmox-VE >> ceph-cluster. All nodes are mons and have two OSDs. >> Du

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Gregory Farnum
On Thu, Jan 12, 2017 at 5:54 AM, Jason Dillaman wrote: > There is option (3) which is to have a new (or modified) > "buffer::create_static" take an optional callback to invoke when the > buffer::raw object is destructed. The raw pointer would be destructed > when the last buffer::ptr / buffer::lis

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Sage Weil
On Thu, 12 Jan 2017, Gregory Farnum wrote: > On Thu, Jan 12, 2017 at 5:54 AM, Jason Dillaman wrote: > > There is option (3) which is to have a new (or modified) > > "buffer::create_static" take an optional callback to invoke when the > > buffer::raw object is destructed. The raw pointer would be d

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Sage Weil
On Thu, 12 Jan 2017, Yehuda Sadeh-Weinraub wrote: > On Thu, Jan 12, 2017 at 12:08 PM, Sage Weil wrote: > > On Thu, 12 Jan 2017, Gregory Farnum wrote: > >> On Thu, Jan 12, 2017 at 5:54 AM, Jason Dillaman > >> wrote: > >> > There is option (3) which is to have a new (or modified) > >> > "buffer::c

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Udo Lembke
Hi Sam, the webfrontend of an external ceph-dash was interrupted till the node was up again. The reboot took app. 5 min. But the ceph -w output shows some IO much faster. I will look tomorrow at the output again and create an ticket. Thanks Udo On 12.01.2017 20:02, Samuel Just wrote: > How

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Samuel Just
Oh, this is basically working as intended. What happened is that the mon died before the pending map was actually committed. The OSD has a timeout (5s) after which it stops trying to mark itself down and just dies (so that OSDs don't hang when killed). It took a bit longer than 5s for the remain

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Gregory Farnum
On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just wrote: > Oh, this is basically working as intended. What happened is that the > mon died before the pending map was actually committed. The OSD has a > timeout (5s) after which it stops trying to mark itself down and just > dies (so that OSDs don't ha

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Samuel Just
That would work. -Sam On Thu, Jan 12, 2017 at 1:40 PM, Gregory Farnum wrote: > On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just wrote: >> Oh, this is basically working as intended. What happened is that the >> mon died before the pending map was actually committed. The OSD has a >> timeout (5s) af

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Shinobu Kinjo
Now I'm totally clear. Regards, On Fri, Jan 13, 2017 at 6:59 AM, Samuel Just wrote: > That would work. > -Sam > > On Thu, Jan 12, 2017 at 1:40 PM, Gregory Farnum wrote: >> On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just wrote: >>> Oh, this is basically working as intended. What happened is that

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Matt Benjamin
Hi, - Original Message - > From: "Yehuda Sadeh-Weinraub" > To: "Sage Weil" > Cc: "Gregory Farnum" , "Jason Dillaman" > , "Piotr Dałek" > , "ceph-devel" , > "ceph-users" > Sent: Thursday, January 12, 2017 3:22:06 PM > Subject: Re: [ceph-users] Any librados C API users out there? > > O

Re: [ceph-users] slow requests break performance

2017-01-12 Thread Brad Hubbard
Check the latency figures in a "perf dump". High numbers in a particular area may help you nail it. I suspect though, that it may come down to enabling debug logging and tracking a slow request through the logs. On Thu, Jan 12, 2017 at 8:41 PM, Eugen Block wrote: > Hi, > >> Looking at the output

Re: [ceph-users] cephfs ata1.00: status: { DRDY }

2017-01-12 Thread Oliver Dzombic
Hi, so i extended IO capability by adding spinning disks ( +10% ) and i stopped scrubbing completely. But the problem keep on coming: 2017-01-12 21:19:18.275826 7f5d93e58700 0 log_channel(cluster) log [WRN] : 19 slow requests, 5 included below; oldest blocked for > 202.408648 secs 2017-01-12 21

[ceph-users] cephfs-data-scan scan_links cross version from master on jewel ?

2017-01-12 Thread Kjetil Jørgensen
Hi, I want/need cephfs-data-scan scan_links, it's in master, although we're currently on jewel (10.2.5). Am I better off cherry-picking the relevant commit onto the jewel branch rather than just using master ? Cheers, -- Kjetil Joergensen SRE, Medallia Inc Phone: +1 (650) 739-6580 _

Re: [ceph-users] cephfs-data-scan scan_links cross version from master on jewel ?

2017-01-12 Thread Gregory Farnum
On Thu, Jan 12, 2017 at 4:10 PM, Kjetil Jørgensen wrote: > Hi, > > I want/need cephfs-data-scan scan_links, it's in master, although we're > currently on jewel (10.2.5). Am I better off cherry-picking the relevant > commit onto the jewel branch rather than just using master ? Almost certainly. I

Re: [ceph-users] HEALTH_OK when one server crashed?

2017-01-12 Thread Christian Balzer
Hello, On Thu, 12 Jan 2017 14:35:32 + Matthew Vernon wrote: > Hi, > > One of our ceph servers froze this morning (no idea why, alas). Ceph > noticed, moved things around, and when I ran ceph -s, said: > > root@sto-1-1:~# ceph -s > cluster 049fc780-8998-45a8-be12-d3b8b6f30e69 > hea

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Christian Balzer
On Thu, 12 Jan 2017 13:59:12 -0800 Samuel Just wrote: > That would work. > -Sam > Having seen similar behavior in the past I made it a habit to manually shut down services before a reboot. Not limited to Ceph and these race conditions have definitely gotten worse with systemd in general. Christ

Re: [ceph-users] HEALTH_OK when one server crashed?

2017-01-12 Thread John Spray
On Fri, Jan 13, 2017 at 12:21 AM, Christian Balzer wrote: > > Hello, > > On Thu, 12 Jan 2017 14:35:32 + Matthew Vernon wrote: > >> Hi, >> >> One of our ceph servers froze this morning (no idea why, alas). Ceph >> noticed, moved things around, and when I ran ceph -s, said: >> >> root@sto-1-1:~#

[ceph-users] 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-01-12 Thread 许雪寒
Thanks for your reply☺ Indeed, Pipe::do_recv would act just as blocked when errno is EAGAIN, however, in Pipe::read_message method, it first checks if there is pending msg on the socket by “Pipe::tcp_read_wait”. So, I think, when Pipe::do_recv is called, it shouldn’t get an EAGAIN, which means

Re: [ceph-users] 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-01-12 Thread jiajia zhong
Yes, but that depends. that might have changed on master barnch. 2017-01-13 10:47 GMT+08:00 许雪寒 : > Thanks for your reply☺ > > Indeed, Pipe::do_recv would act just as blocked when errno is EAGAIN, > however, in Pipe::read_message method, it first checks if there is pending > msg on the socket by

[ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-01-12 Thread 许雪寒
Thank you for your continuous help☺. We are using hammer 0.94.5 version, and what I read is the version of the source code. However, on the other hand, if Pipe::do_recv do act as blocked, is it reasonable for the Pipe::reader_thread to block threads calling SimpleMessenger::submit_message by ho

[ceph-users] Calamari or Alternative

2017-01-12 Thread Tu Holmes
Hey Cephers. Question for you. Do you guys use Calamari or an alternative? If so, why has the installation of Calamari not really gotten much better recently. Are you still building the vagrant installers and building packages? Just wondering what you are all doing. Thanks. //Tu

Re: [ceph-users] Calamari or Alternative

2017-01-12 Thread John Petrini
I used Calamari before making the move to Ubuntu 16.04 and upgrading to Jewel. At the time I tried to install it on 16.04 but couldn't get it working. I'm now using ceph-dash along with the nagios plugin check_ceph_dash

Re: [ceph-users] Calamari or Alternative

2017-01-12 Thread Tu Holmes
I'll give ceph-dash a look. Thanks! On Thu, Jan 12, 2017 at 9:19 PM John Petrini wrote: > I used Calamari before making the move to Ubuntu 16.04 and upgrading to > Jewel. At the time I tried to install it on 16.04 but couldn't get it > working. > > I'm now using ceph-dash