[ceph-users] librados API never kills threads

2016-09-13 Thread Stuart Byma
Hi, Can anyone tell me why librados creates multiple threads per object, and never kills them, even when the ioctx is deleted? I am using the C++ API with a single connection and a single IO context. More threads and memory are used for each new object accessed. Is there a way to prevent this b

Re: [ceph-users] ceph-osd fail to be started

2016-09-13 Thread Ronny Aasen
On 13. sep. 2016 07:10, strony zhang wrote: Hi, My ceph cluster include 5 OSDs. 3 osds are installed in the host 'strony-tc' and 2 are in the host 'strony-pc'. Recently, both of hosts were rebooted due to power cycles. After all of disks are mounted again, the ceph-osd are in the 'down' status.

Re: [ceph-users] problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

2016-09-13 Thread Ronny Aasen
I suspect this must be a difficult question since there have been no replies on irc or mailinglist. assuming it's impossible to get these osd's running again. Is there a way to recover objects from the disks. ? they are mounted and data is readable. I have pg's down since they want to probe th

[ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
Hello everyone, I have met a ceph-fuse crash when i add osd to osd pool. I am writing data through ceph-fuse,then i add one osd to osd pool, after less than 30 s, the ceph-fuse process crash. The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow: [root@localhost ~]# rp

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread John Spray
On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang wrote: > Hello everyone, > > I have met a ceph-fuse crash when i add osd to osd pool. > > I am writing data through ceph-fuse,then i add one osd to osd pool, after > less than 30 s, the ceph-fuse process crash. It looks like this could be an ObjectCac

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
This problem was reproducible. I remove one osd from the osd tree and after one minute, I add the same osd to osd pool and then fuse client crush. Ceph fuse is writing data through smallfile too, and the script is " python smallfile_cli.py --top /mnt/test --threads 8 --files 20 --file-size

Re: [ceph-users] librados API never kills threads

2016-09-13 Thread Josh Durgin
On 09/13/2016 01:13 PM, Stuart Byma wrote: Hi, Can anyone tell me why librados creates multiple threads per object, and never kills them, even when the ioctx is deleted? I am using the C++ API with a single connection and a single IO context. More threads and memory are used for each new obje

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
I have submitted the issue at "http://tracker.ceph.com/issues/17270";. At 2016-09-13 17:01:09, "John Spray" wrote: >On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang wrote: >> Hello everyone, >> >> I have met a ceph-fuse crash when i add osd to osd pool. >> >> I am writing data through ceph-fuse,t

Re: [ceph-users] swiftclient call radosgw, it always response 401 Unauthorized

2016-09-13 Thread Brian Chang-Chien
Hi ,naga.b I use Ceph jewel 10.2.2 my ceph.conf as follow [global] fsid = d056c174-2e3a-4c36-a067-cb774d176ce2 mon_initial_members = brianceph mon_host = 10.62.9.140 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd_crush_chooseleaf_type = 0 osd_pool_def

[ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov
Hello list, I have the following cluster: ceph status cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 health HEALTH_OK monmap e2: 5 mons at {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0} election epoch 196, quorum 0,1,2

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread John Spray
On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang wrote: > Hello everyone, > > I have met a ceph-fuse crash when i add osd to osd pool. > > I am writing data through ceph-fuse,then i add one osd to osd pool, after > less than 30 s, the ceph-fuse process crash. > > The ceph-fuse client is 10.2.2, and t

[ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Daznis
Hello, I have encountered a strange I/O freeze while rebooting one OSD node for maintenance purpose. It was one of the 3 Nodes in the entire cluster. Before this rebooting or shutting down and entire node just slowed down the ceph, but not completely froze it.

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread Dennis Kramer (DBS)
I also have this problem. Is it perhaps possible to block clients entirely if it is not using a specific version of Ceph? BTW, I often stumble upon the cephfs problem: "client failing to respond to capability release", which result in blocked requests aswell. But i'm not entirely sure if you run C

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread David
What froze? Kernel RBD? Librbd? CephFS? Ceph version? On Tue, Sep 13, 2016 at 11:24 AM, Daznis wrote: > Hello, > > > I have encountered a strange I/O freeze while rebooting one OSD node > for maintenance purpose. It was one of the 3 Nodes in the entire > cluster. Before this rebooting or shutti

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov wrote: > Hello list, > > > I have the following cluster: > > ceph status > cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 > health HEALTH_OK > monmap e2: 5 mons at > {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=xxx

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread M Ranga Swami Reddy
Please check if any osd is nearfull ERR. Can you please share the ceph -s o/p? Thanks Swami On Tue, Sep 13, 2016 at 3:54 PM, Daznis wrote: > Hello, > > > I have encountered a strange I/O freeze while rebooting one OSD node > for maintenance purpose. It was one of the 3 Nodes in the entire > clu

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Daznis
No, no errors about that. I have set noout before it happened, but it still started recovery. I have added nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed it started doing crazy stuff. So recovery I/O stopped but the cluster can't read any info. Only writes to cache layer.

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Sean Redmond
Hi, The host that is taken down has 12 disks in it? Have a look at the down PG's '18 pgs down' - I suspect this will be what is causing the I/O freeze. Is your cursh map setup correctly to split data over different hosts? Thanks On Tue, Sep 13, 2016 at 11:45 AM, Daznis wrote: > No, no errors

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Daznis
Yes that one has +2 OSD's on it. root default { id -1 # do not change unnecessarily # weight 116.480 alg straw hash 0 # rjenkins1 item OSD-1 weight 36.400 item OSD-2 weight 36.400 item OSD-3 weight 43.680 } rule replicated_ruleset

Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Goncalo Borges
Hi Daznis... Something is not quite right. You have pools with 2 replicas (right?). The fact that you have 18 down pgs says that both the OSDS acting on those pgs are with problems. You should try to understand which PGs are down and which OSDs are acting on them ('ceph pg dump_stuck' or 'ceph

[ceph-users] Network testing tool.

2016-09-13 Thread Owen Synge
Dear all, Often issues arise with badly configured network switches, vlans, and such like. Knowing each node routes to is a major deployment fail and can be difficult to diagnose. The brief looks like this: Description: * Diagnose network issues quickly for ceph. * Identify network issues b

Re: [ceph-users] help on keystone v3 ceph.conf in Jewel

2016-09-13 Thread Robert Duncan
Thanks Jean-Charles, It was the ceph client packages on the cinder node as you suspected, I now have a working rbd driver with cinder, I am left only with one other problem since the upgrade which has me stumped: The rados gateway, Apache can't seem to proxy to the service ServerName node-10

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov
On 09/13/2016 01:33 PM, Ilya Dryomov wrote: > On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov wrote: >> Hello list, >> >> >> I have the following cluster: >> >> ceph status >> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 >> health HEALTH_OK >> monmap e2: 5 mons at >> {alxc10=xxx

Re: [ceph-users] Network testing tool.

2016-09-13 Thread Mark Nelson
On 09/13/2016 06:46 AM, Owen Synge wrote: Dear all, Often issues arise with badly configured network switches, vlans, and such like. Knowing each node routes to is a major deployment fail and can be difficult to diagnose. The brief looks like this: Description: * Diagnose network issues qui

[ceph-users] RadosGW performance degradation on the 18 millions objects stored.

2016-09-13 Thread Stas Starikevich
Hi All, Asking your assistance with the RadosGW performance degradation on the 18M objects placed (http://pasteboard.co/g781YI3J.png ). Drops from 620 uploads\s to 180-190 uploads\s. I made list of tests and see that upload performance degrades in 3-4 times wh

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 1:59 PM, Nikolay Borisov wrote: > > > On 09/13/2016 01:33 PM, Ilya Dryomov wrote: >> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov wrote: >>> Hello list, >>> >>> >>> I have the following cluster: >>> >>> ceph status >>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread WRIGHT, JON R (JON R)
Yes, I do have old clients running. The clients are all vms. Is it typical that vm clients have to be rebuilt after a ceph upgrade? Thanks, Jon On 9/12/2016 4:05 PM, Wido den Hollander wrote: Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" : Since upgrading to Jewel from H

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread WRIGHT, JON R (JON R)
Yes, vms and volumes existed across the ceph releases. But the vms were rebooted and the volumes reattached following the upgrade. The vms were all Ubuntu 14.04 before and after the upgrade. Thanks, Jon On 9/12/2016 8:28 PM, shiva rkreddy wrote: By saying "old clients" did you mean, (a) Cl

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov
On 09/13/2016 04:30 PM, Ilya Dryomov wrote: [SNIP] > > Hmm, it could be about whether it is able to do journal replay on > mount. When you mount a snapshot, you get a read-only block device; > when you mount a clone image, you get a read-write block device. > > Let's try this again, suppose im

Re: [ceph-users] RadosGW performance degradation on the 18 millions objects stored.

2016-09-13 Thread Mark Nelson
On 09/13/2016 08:17 AM, Stas Starikevich wrote: Hi All, Asking your assistance with the RadosGW performance degradation on the 18M objects placed (http://pasteboard.co/g781YI3J.png). Drops from 620 uploads\s to 180-190 uploads\s. I made list of tests and see that upload performance degrades i

Re: [ceph-users] problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

2016-09-13 Thread Henrik Korkuc
On 16-09-13 11:13, Ronny Aasen wrote: I suspect this must be a difficult question since there have been no replies on irc or mailinglist. assuming it's impossible to get these osd's running again. Is there a way to recover objects from the disks. ? they are mounted and data is readable. I hav

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread Wido den Hollander
> Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" > : > > > Yes, I do have old clients running. The clients are all vms. Is it > typical that vm clients have to be rebuilt after a ceph upgrade? > No, not always, but it is just that I saw this happening recently after a Jewel

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov wrote: > > > On 09/13/2016 04:30 PM, Ilya Dryomov wrote: > [SNIP] >> >> Hmm, it could be about whether it is able to do journal replay on >> mount. When you mount a snapshot, you get a read-only block device; >> when you mount a clone image, you ge

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread WRIGHT, JON R (JON R)
VM Client OS: ubuntu 14.04 Openstack: kilo libvirt: 1.2.12 nova-compute-kvm: 1:2015.1.4-0ubuntu2 Jon On 9/13/2016 11:17 AM, Wido den Hollander wrote: Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" : Yes, I do have old clients running. The clients are all vms. Is it typic

Re: [ceph-users] Lots of "wrongly marked me down" messages

2016-09-13 Thread Oliver Francke
Hi, I can only second this, revert all, but especially: net.core.netdev_max_backlog = 5 this def. leads to bad behaviour, so back to 1000, or max 2500 and re-check Regards, Oliver. > Am 12.09.2016 um 22:06 schrieb Wido den Hollander : > >> net.core.netdev_max_backlog = 5 __

Re: [ceph-users] ceph-osd fail to be started

2016-09-13 Thread strony zhang
Hi Ronny, After the disks are activated, the OSDs get recovered. Thanks for your info. Thanks,Strony On Tuesday, September 13, 2016 1:00 AM, Ronny Aasen wrote: On 13. sep. 2016 07:10, strony zhang wrote: > Hi, > > My ceph cluster include 5 OSDs. 3 osds are installed in the host > 'stron

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
I have tried all Jewel packages and it runs correctly and I think the problem is in osdc at ceph-0.94-3. There must be some previous commits which solved the problem. At 2016-09-13 18:08:19, "John Spray" wrote: >On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang wrote: >> Hello everyone, >> >> I

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Adrian Saul
I found I could ignore the XFS issues and just mount it with the appropriate options (below from my backup scripts): # # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) # if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then