[ceph-users] Stuck creating pg
Hi, I have a ceph cluster with 26 osd's in 4 hosts only use for rbd for an OpenStack cluster (started at 0.48 I think), currently running 0.94.2 on Ubuntu 14.04. A few days ago one of the osd's was at 85% disk usage while only 30% of the raw disk space is used. I ran reweight-by-utilization with 150 was cutoff level. This reshuffled the data. I also noticed that the number of pg was still at the level when there were less disks in the cluster (1300). Based on the current guidelines I increased pg_num to 2048. It created the placement groups except for the last one. To try to force the creation of the pg I removed the OSD's (ceph osd out) assigned to that pg but that makes no difference. Currently all OSD's are back in and two pg's are also stuck in an unclean state: ceph health detail: HEALTH_WARN 2 pgs degraded; 2 pgs stale; 2 pgs stuck degraded; 1 pgs stuck inactive; 2 pgs stuck stale; 3 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 59 requests are blocked > 32 sec; 3 osds have slow requests; recovery 221/549658 objects degraded (0.040%); recovery 221/549658 objects misplaced (0.040%); pool volumes pg_num 2048 > pgp_num 1400 pg 5.6c7 is stuck inactive since forever, current state creating, last acting [19,25] pg 5.6c7 is stuck unclean since forever, current state creating, last acting [19,25] pg 5.2c7 is stuck unclean for 313513.609864, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck unclean for 313513.610368, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck undersized for 308381.750768, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck undersized for 308381.751913, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck degraded for 308381.750876, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck degraded for 308381.752021, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck stale for 281750.295301, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck stale for 281750.295293, current state stale+active+undersized+degraded+remapped, last acting [9] 16 ops are blocked > 268435 sec 10 ops are blocked > 134218 sec 10 ops are blocked > 1048.58 sec 23 ops are blocked > 524.288 sec 16 ops are blocked > 268435 sec on osd.1 8 ops are blocked > 134218 sec on osd.17 2 ops are blocked > 134218 sec on osd.19 10 ops are blocked > 1048.58 sec on osd.19 23 ops are blocked > 524.288 sec on osd.19 3 osds have slow requests recovery 221/549658 objects degraded (0.040%) recovery 221/549658 objects misplaced (0.040%) pool volumes pg_num 2048 > pgp_num 1400 OSD 9 was the one that was the primary when the pg creation process got stuck. This OSD has been removed and added again (not only osd out but also removed from the crush map and added again) The bad data distribution was probably caused by the low number of pg's and mainly bad weighing of the OSD. I changed the crush map to give the same weight to each of the OSD's but that does not change these problems either: ceph osd tree: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 6.5 pool default -6 2.0 host droplet4 16 0.25000 osd.16 up 1.0 1.0 20 0.25000 osd.20 up 1.0 1.0 21 0.25000 osd.21 up 1.0 1.0 22 0.25000 osd.22 up 1.0 1.0 6 0.25000 osd.6 up 1.0 1.0 18 0.25000 osd.18 up 1.0 1.0 19 0.25000 osd.19 up 1.0 1.0 23 0.25000 osd.23 up 1.0 1.0 -5 1.5 host droplet3 3 0.25000 osd.3 up 1.0 1.0 13 0.25000 osd.13 up 1.0 1.0 15 0.25000 osd.15 up 1.0 1.0 4 0.25000 osd.4 up 1.0 1.0 25 0.25000 osd.25 up 1.0 1.0 14 0.25000 osd.14 up 1.0 1.0 -2 1.5 host droplet1 7 0.25000 osd.7 up 1.0 1.0 1 0.25000 osd.1 up 1.0 1.0 0 0.25000 osd.0 up 1.0 1.0 9 0.25000 osd.9 up 1.0 1.0 12 0.25000 osd.12 up 1.0 1.0 17 0.25000 osd.17 up 1.0 1.0 -4 1.5 host droplet2 10 0.25000 osd.10 up 1.0 1.0 8 0.25000 osd.8 up 1.0 1.0 11 0.25000 osd.11 up 1.0 1.0 2 0.25000 osd.2 up 1.0 1.0 24 0.25000 osd.24 up 1.0 1.0
[ceph-users] Ceph File System ACL Support
Hi, I need to verify in Ceph v9.0.2 if the kernel version of Ceph file system supports ACLs and the libcephfs file system interface does not. I am trying to have SAMBA, version 4.3.0rc1, support Windows ACLs using "vfs objects = acl_xattr" with the SAMBA VFS Ceph file system interface "vfs objects = ceph" and my tests are failing. If I use a kernel mount of the same Ceph file system, it works. Using the SAMBA Ceph VFS interface with logging set to 3 in my smb.conf files shows the following error when on my Windows AD server I try to "Disable inheritance" of the SAMBA exported directory uu/home: [2015/08/16 18:27:11.546307, 2] ../source3/smbd/posix_acls.c:3006(set_canon_ace_list) set_canon_ace_list: sys_acl_set_file type file failed for file uu/home (Operation not supported). This works using the same Ceph file system kernel mounted. It also works with an XFS file system. Doing some Googling I found this entry on the SAMBA email list: https://lists.samba.org/archive/samba-technical/2015-March/106699.html It states: libcephfs does not support ACL yet, so this patch adds ACL callbacks that do nothing. If ACL support is not in libcephfs, is there plans to add it, as the SAMBA Ceph VFS interface without ACL support is severely limited in a multi-user Windows environment. Thanks, Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to improve single thread sequential reads?
Hi Nick, On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk wrote: >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Nick Fisk >> Sent: 13 August 2015 18:04 >> To: ceph-users@lists.ceph.com >> Subject: [ceph-users] How to improve single thread sequential reads? >> >> Hi, >> >> I'm trying to use a RBD to act as a staging area for some data before > pushing >> it down to some LTO6 tapes. As I cannot use striping with the kernel > client I >> tend to be maxing out at around 80MB/s reads testing with DD. Has anyone >> got any clever suggestions of giving this a bit of a boost, I think I need > to get it >> up to around 200MB/s to make sure there is always a steady flow of data to >> the tape drive. > > I've just tried the testing kernel with the blk-mq fixes in it for full size > IO's, this combined with bumping readahead up to 4MB, is now getting me on > average 150MB/s to 200MB/s so this might suffice. > > On a personal interest, I would still like to know if anyone has ideas on > how to really push much higher bandwidth through a RBD. Some settings in our ceph.conf that may help: osd_op_threads = 20 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k filestore_queue_max_ops = 9 filestore_flusher = false filestore_max_sync_interval = 10 filestore_sync_flush = false Regards, Alex > >> >> Rbd-fuse seems to top out at 12MB/s, so there goes that option. >> >> I'm thinking mapping multiple RBD's and then combining them into a mdadm >> RAID0 stripe might work, but seems a bit messy. >> >> Any suggestions? >> >> Thanks, >> Nick >> > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to improve single thread sequential reads?
Have you tried setting read_ahead_kb to bigger number for both client/OSD side if you are using krbd ? In case of librbd, try the different config options for rbd cache.. Thanks & Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alex Gorbachev Sent: Sunday, August 16, 2015 7:07 PM To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to improve single thread sequential reads? Hi Nick, On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk wrote: >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >> Of Nick Fisk >> Sent: 13 August 2015 18:04 >> To: ceph-users@lists.ceph.com >> Subject: [ceph-users] How to improve single thread sequential reads? >> >> Hi, >> >> I'm trying to use a RBD to act as a staging area for some data before > pushing >> it down to some LTO6 tapes. As I cannot use striping with the kernel > client I >> tend to be maxing out at around 80MB/s reads testing with DD. Has >> anyone got any clever suggestions of giving this a bit of a boost, I >> think I need > to get it >> up to around 200MB/s to make sure there is always a steady flow of >> data to the tape drive. > > I've just tried the testing kernel with the blk-mq fixes in it for > full size IO's, this combined with bumping readahead up to 4MB, is now > getting me on average 150MB/s to 200MB/s so this might suffice. > > On a personal interest, I would still like to know if anyone has ideas > on how to really push much higher bandwidth through a RBD. Some settings in our ceph.conf that may help: osd_op_threads = 20 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k filestore_queue_max_ops = 9 filestore_flusher = false filestore_max_sync_interval = 10 filestore_sync_flush = false Regards, Alex > >> >> Rbd-fuse seems to top out at 12MB/s, so there goes that option. >> >> I'm thinking mapping multiple RBD's and then combining them into a >> mdadm >> RAID0 stripe might work, but seems a bit messy. >> >> Any suggestions? >> >> Thanks, >> Nick >> > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph File System ACL Support
On Mon, Aug 17, 2015 at 9:38 AM, Eric Eastman wrote: > Hi, > > I need to verify in Ceph v9.0.2 if the kernel version of Ceph file > system supports ACLs and the libcephfs file system interface does not. > I am trying to have SAMBA, version 4.3.0rc1, support Windows ACLs > using "vfs objects = acl_xattr" with the SAMBA VFS Ceph file system > interface "vfs objects = ceph" and my tests are failing. If I use a > kernel mount of the same Ceph file system, it works. Using the SAMBA > Ceph VFS interface with logging set to 3 in my smb.conf files shows > the following error when on my Windows AD server I try to "Disable > inheritance" of the SAMBA exported directory uu/home: > > [2015/08/16 18:27:11.546307, 2] > ../source3/smbd/posix_acls.c:3006(set_canon_ace_list) > set_canon_ace_list: sys_acl_set_file type file failed for file > uu/home (Operation not supported). > > This works using the same Ceph file system kernel mounted. It also > works with an XFS file system. > > Doing some Googling I found this entry on the SAMBA email list: > > https://lists.samba.org/archive/samba-technical/2015-March/106699.html > > It states: libcephfs does not support ACL yet, so this patch adds ACL > callbacks that do nothing. > > If ACL support is not in libcephfs, is there plans to add it, as the > SAMBA Ceph VFS interface without ACL support is severely limited in a > multi-user Windows environment. > libcephfs does not support ACL. I have an old patch that adds ACL support to samba's vfs ceph module, but haven't tested it carefully. Yan, Zheng > Thanks, > Eric > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph File System ACL Support
On Sun, Aug 16, 2015 at 9:12 PM, Yan, Zheng wrote: > On Mon, Aug 17, 2015 at 9:38 AM, Eric Eastman > wrote: >> Hi, >> >> I need to verify in Ceph v9.0.2 if the kernel version of Ceph file >> system supports ACLs and the libcephfs file system interface does not. >> I am trying to have SAMBA, version 4.3.0rc1, support Windows ACLs >> using "vfs objects = acl_xattr" with the SAMBA VFS Ceph file system >> interface "vfs objects = ceph" and my tests are failing. If I use a >> kernel mount of the same Ceph file system, it works. Using the SAMBA >> Ceph VFS interface with logging set to 3 in my smb.conf files shows >> the following error when on my Windows AD server I try to "Disable >> inheritance" of the SAMBA exported directory uu/home: >> >> [2015/08/16 18:27:11.546307, 2] >> ../source3/smbd/posix_acls.c:3006(set_canon_ace_list) >> set_canon_ace_list: sys_acl_set_file type file failed for file >> uu/home (Operation not supported). >> >> This works using the same Ceph file system kernel mounted. It also >> works with an XFS file system. >> >> Doing some Googling I found this entry on the SAMBA email list: >> >> https://lists.samba.org/archive/samba-technical/2015-March/106699.html >> >> It states: libcephfs does not support ACL yet, so this patch adds ACL >> callbacks that do nothing. >> >> If ACL support is not in libcephfs, is there plans to add it, as the >> SAMBA Ceph VFS interface without ACL support is severely limited in a >> multi-user Windows environment. >> > > libcephfs does not support ACL. I have an old patch that adds ACL > support to samba's vfs ceph module, but haven't tested it carefully. > > Yan, Zheng > Thank you for confirming what I am seeing. It would be nice to have ACL support for SAMBA. I would be able to do some testing of the patch if that would help. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph distributed osd
Hi All, We need to test three OSD and one image with replica 2(size 1GB). While testing data is not writing above 1GB. Is there any option to write on third OSD. ceph osd pool get repo pg_num pg_num: 126 # rbd showmapped id pool image snap device 0 rbd integdownloads -/dev/rbd0 -- Already one 2 repo integrepotest -/dev/rbd2 -- newly created [root@hm2 repository]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda5ext4 289G 18G 257G 7% / devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfstmpfs 252G 0 252G 0% /dev/shm tmpfstmpfs 252G 538M 252G 1% /run tmpfstmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2ext4 488M 212M 241M 47% /boot /dev/sda4ext4 1.9T 20G 1.8T 2% /var /dev/mapper/vg0-zoho ext4 8.6T 1.7T 6.5T 21% /zoho /dev/rbd0ocfs2 977G 101G 877G 11% /zoho/build/downloads /dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/ root@integ-hm2's password: sample.txt 100% 1024MB 4.5MB/s 03:48 scp: /zoho/build/repository//sample.txt: No space left on device Regards Prabu On Thu, 13 Aug 2015 19:42:11 +0530 gjprabuwrote Dear Team, We are using two ceph OSD with replica 2 and it is working properly. Here my doubt is (Pool A -image size will be 10GB) and its replicated with two OSD, what will happen suppose if the size reached the limit, Is there any chance to make the data to continue writing in another two OSD's. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph distributed osd
Hi All, Also please find osd information. ceph osd dump | grep 'replicated size' pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 126 pgp_num 126 last_change 21573 flags hashpspool stripe_width 0 Regards Prabu On Mon, 17 Aug 2015 11:58:55 +0530 gjprabuwrote Hi All, We need to test three OSD and one image with replica 2(size 1GB). While testing data is not writing above 1GB. Is there any option to write on third OSD. ceph osd pool get repo pg_num pg_num: 126 # rbd showmapped id pool image snap device 0 rbd integdownloads -/dev/rbd0 -- Already one 2 repo integrepotest -/dev/rbd2 -- newly created [root@hm2 repository]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda5ext4 289G 18G 257G 7% / devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfstmpfs 252G 0 252G 0% /dev/shm tmpfstmpfs 252G 538M 252G 1% /run tmpfstmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2ext4 488M 212M 241M 47% /boot /dev/sda4ext4 1.9T 20G 1.8T 2% /var /dev/mapper/vg0-zoho ext4 8.6T 1.7T 6.5T 21% /zoho /dev/rbd0ocfs2 977G 101G 877G 11% /zoho/build/downloads /dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/ root@integ-hm2's password: sample.txt 100% 1024MB 4.5MB/s 03:48 scp: /zoho/build/repository//sample.txt: No space left on device Regards Prabu On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu wrote Dear Team, We are using two ceph OSD with replica 2 and it is working properly. Here my doubt is (Pool A -image size will be 10GB) and its replicated with two OSD, what will happen suppose if the size reached the limit, Is there any chance to make the data to continue writing in another two OSD's. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com