[ceph-users] cephx: verify_reply couldn't decrypt with error (failed verifying authorize reply)
Hi Experts, After implemented Ceph initially with 3 OSDs, now I am facing an issue: It reports healthy but sometimes(or often) fails to access the pools. While sometimes it comes back to normal automatically. For example: *[*ceph@gcloudcon ceph-cluster]$ *rados -p volumes ls* 2015-03-24 11:44:17.262941 7f3d6bfff700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-03-24 11:44:17.262951 7f3d6bfff700 0 -- 206.12.25.25:0/1004580 >> 206.12.25.27:6800/802 pipe(0x26d7fe0 sd=4 :55582 s=1 pgs=0 cs=0 l=1 c=0x26d8270).failed verifying authorize reply 2015-03-24 11:44:17.262999 7f3d6bfff700 0 -- 206.12.25.25:0/1004580 >> 206.12.25.27:6800/802 pipe(0x26d7fe0 sd=4 :55582 s=1 pgs=0 cs=0 l=1 c=0x26d8270).fault 2015-03-24 11:44:17.263637 7f3d6bfff700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-03-24 11:44:17.263645 7f3d6bfff700 0 -- 206.12.25.25:0/1004580 >> 206.12.25.27:6800/802 pipe(0x26d7fe0 sd=4 :55583 s=1 pgs=0 cs=0 l=1 c=0x26d8270).failed verifying authorize reply 2015-03-24 11:44:17.464379 7f3d6bfff700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-03-24 11:44:17.464388 7f3d6bfff700 0 -- 206.12.25.25:0/1004580 >> 206.12.25.27:6800/802 pipe(0x26d7fe0 sd=4 :55584 s=1 pgs=0 cs=0 l=1 c=0x26d8270).failed verifying authorize reply 2015-03-24 11:44:17.865222 7f3d6bfff700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-03-24 11:44:17.865245 7f3d6bfff700 0 -- 206.12.25.25:0/1004580 >> 206.12.25.27:6800/802 pipe(0x26d7fe0 sd=4 :55585 s=1 pgs=0 cs=0 l=1 c=0x26d8270).failed verifying authorize reply 2015-03-24 11:44:18.666056 7f3d6bfff700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-03-24 11:44:18.666077 7f3d6bfff700 0 -- 206.12.25.25:0/1004580 >> 206.12.25.27:6800/802 pipe(0x26d7fe0 sd=4 :55587 s=1 pgs=0 cs=0 l=1 c=0x26d8270).failed verifying authorize reply [ceph@gcloudcon ceph-cluster]$*ceph auth list* installed auth entries: mds.gcloudnet key: xxx caps: [mds] allow caps: [mon] allow profile mds caps: [osd] allow rwx osd.0 key: xxx caps: [mon] allow profile osd caps: [osd] allow * osd.1 key: xxx caps: [mon] allow profile osd caps: [osd] allow * osd.2 key: xxx caps: [mon] allow profile osd caps: [osd] allow * client.admin key: xxx caps: [mds] allow caps: [mon] allow * caps: [osd] allow * client.backups key: xxx caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=backups client.bootstrap-mds key: xxx caps: [mon] allow profile bootstrap-mds client.bootstrap-osd key: xxx caps: [mon] allow profile bootstrap-osd client.images key: xxx caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=images client.libvirt key: xxx caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=libvirt-pool client.volumes key: xxx caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images [root@gcloudcon ~]# *more /etc/ceph/ceph.conf* [global] auth_service_required = cephx osd_pool_default_size = 2 filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 206.12.25.26 public_network = 206.12.25.0/16 mon_initial_members = gcloudnet cluster_network = 192.168.10.0/16 fsid = xx [client.images] keyring = /etc/ceph/ceph.client.images.keyring [client.volumes] keyring = /etc/ceph/ceph.client.volumes.keyring [client.backups] keyring = /etc/ceph/ceph.client.backups.keyring [ceph@gcloudcon ceph-cluster]$ *ceph -w* cluster a4d0879f-abdc-4f9d-8a4b-53ce57d822f1 health HEALTH_OK monmap e1: 1 mons at {gcloudnet=206.12.25.26:6789/0}, election epoch 1, quorum 0 gcloudnet osdmap e27: 3 osds: 3 up, 3 in pgmap v1894: 704 pgs, 6 pools, 1640 MB data, 231 objects 18757 MB used, 22331 GB / 22350 GB avail 704 active+clean 2015-03-24 17:56:20.884293 mon.0 [INF] from='client.? 206.12.25.25:0/1006501' entity='client.admin' cmd=[{"prefix": "auth list"}]: dispatch *Can anybody give me a hint or what I should check?* Thanks, -- Erming Pei, Senior System Analyst Information Services & Technology University of Alberta, Canada Tel: 7804929914Fax: 7804921729 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs read-only setting doesn't work?
Hi, I tried to set up a read-only permission for a client but it looks always writable. I did the following: ==Server end== [client.cephfs_data_ro] key = AQxx== caps mon = "allow r" caps osd = "allow r pool=cephfs_data, allow r pool=cephfs_metadata" ==Client end== mount -v -t ceph hostname.domainname:6789:/ /cephfs -o name=cephfs_data_ro,secret=AQxx== But I still can touch, delete, overwrite. I read that touch/delete could be only meta data operations, but why I still can overwrite? Is there anyway I could test/check the data pool (instead of meta data) to see if any effect on it? Erming -- --------- Erming Pei, Ph.D Senior System Analyst; Grid/Cloud Specialist Research Computing Group Information Services & Technology University of Alberta, Canada Tel: +1 7804929914Fax: +1 7804921729 - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs read-only setting doesn't work?
On 9/2/15, 9:31 AM, Gregory Farnum wrote: [ Re-adding the list. ] On Wed, Sep 2, 2015 at 4:29 PM, Erming Pei wrote: Hi Gregory, Thanks very much for the confirmation and explanation. And I presume you have an MDS cap in there as well? Is there a difference between set this cap and without setting? Well, I don't think you can access the MDS without a read cap, but maybe it's really just null... I asked this as I don't see a difference on operating files. I think you'll find that the data you've overwritten isn't really written to the OSDs — you wrote it in the local page cache, but the OSDs will reject the writes with EPERM. I see. Is there a way for me to verify that, i.e., see there is not a change to the data is OSDs? I found I can overwrite a file and then I can see the file is changed. It may be in the local cache. But how can I test and retrieve one in the OSD pool? Mounting it on another client and seeing if changes are reflected there would do it. Or unmounting the filesystem, mounting again, and seeing if the file has really changed. -Greg Good idea. Thank you Gregory. Erming Thanks! Erming On 9/2/15, 2:44 AM, Gregory Farnum wrote: On Tue, Sep 1, 2015 at 9:20 PM, Erming Pei wrote: Hi, I tried to set up a read-only permission for a client but it looks always writable. I did the following: ==Server end== [client.cephfs_data_ro] key = AQxx== caps mon = "allow r" caps osd = "allow r pool=cephfs_data, allow r pool=cephfs_metadata" The clients don't directly access the metadata pool at all so you don't need to grant that. :) And I presume you have an MDS cap in there as well? ==Client end== mount -v -t ceph hostname.domainname:6789:/ /cephfs -o name=cephfs_data_ro,secret=AQxx== But I still can touch, delete, overwrite. I read that touch/delete could be only meta data operations, but why I still can overwrite? Is there anyway I could test/check the data pool (instead of meta data) to see if any effect on it? What you're seeing here is an unfortunate artifact of the page cache and the way these user capabilities work in Ceph. As you surmise, touch/delete are metadata operations through the MDS and in current code you can't block the client off from that (although we have work in progress to improve things). I think you'll find that the data you've overwritten isn't really written to the OSDs — you wrote it in the local page cache, but the OSDs will reject the writes with EPERM. I don't remember the kernel's exact behavior here though — we updated the userspace client to preemptively check access permissions on new pools but I don't think the kernel ever got that. Zheng? -Greg -- - Erming Pei, Ph.D Senior System Analyst; Grid/Cloud Specialist Research Computing Group Information Services & Technology University of Alberta, Canada Tel: +1 7804929914Fax: +1 7804921729 --------- -- - Erming Pei, Ph.D Senior System Analyst; Grid/Cloud Specialist Research Computing Group Information Services & Technology University of Alberta, Canada Tel: +1 7804929914Fax: +1 7804921729 - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds issue
Hi, After I set up more than 1 mds servers, it sometimes gets stuck or slow from client end. I tried to stop one mds and then the client end will hang there. I accidentally set up bal frag=true. Not sure if it matters. Later I disabled this feature. Is there any reason for the above issue? What can be done to check or tune the mds performance? Can I just reduce the mds number on the fly? Thanks, Erming ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS namespace
Hi, Is there a way to list the namespaces in cephfs? How to set it up? From man page of ceph.mount, I see this: /To mount only part of the namespace:// // // mount.ceph monhost1:/some/small/thing /mnt/thing/ But how to know the namespaces at first? Thanks, Erming -- - Erming Pei, Ph.D Senior System Analyst; Grid/Cloud Specialist Research Computing Group Information Services & Technology University of Alberta, Canada Tel: +1 7804929914Fax: +1 7804921729 - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS namespace
I see. That's also what I needed. Thanks. Can we only allow a part of the 'namespace' or directory tree to be mounted from *server* end? Just like NFS exporting? And even setting of permissions as well? Erming On 10/19/15, 4:07 PM, Gregory Farnum wrote: On Mon, Oct 19, 2015 at 3:06 PM, Erming Pei wrote: Hi, Is there a way to list the namespaces in cephfs? How to set it up? From man page of ceph.mount, I see this: To mount only part of the namespace: mount.ceph monhost1:/some/small/thing /mnt/thing But how to know the namespaces at first? "Namespace" here means "directory tree" or "folder hierarchy". -Greg -- --------- Erming Pei, Ph.D Senior System Analyst; Grid/Cloud Specialist Research Computing Group Information Services & Technology University of Alberta, Canada Tel: +1 7804929914Fax: +1 7804921729 - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs best practice
Hi, I am just wondering which use case is better: (within one single file system) set up one data pool for each project, or let project to share a big pool? Thanks, Erming ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Increased pg_num and pgp_num
Hi, I found that the pg_num and pgp_num for meta data pool was too small and then increased them. Then I got "300 pgs stuck unclean". / $ ceph -s cluster a4d0879f-abdc-4f9d-8a4b-53ce57d822f1 health HEALTH_WARN 248 pgs backfill; 52 pgs backfilling; 300 pgs stuck unclean; recovery 58417161/113290060 objects misplaced (51.564%); mds0: Client physics-007:Physics01_data failing to respond to cache pressure / Is it critical? thanks, Erming -- ----- Erming Pei, Ph.D Senior System Analyst; Grid/Cloud Specialist Research Computing Group Information Services & Technology University of Alberta, Canada Tel: +1 7804929914Fax: +1 7804921729 - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] scrub error with ceph
Hi, I found there are 128 scrub errors in my ceph system. Checked with health detail and found many pgs with stuck unclean issue. Should I repair all of them? Or what I should do? [root@gcloudnet ~]# ceph -s cluster a4d0879f-abdc-4f9d-8a4b-53ce57d822f1 health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; mds1: Client HTRC:cephfs_data failing to respond to cache pressure; mds0: Client physics-007:cephfs_data failing to respond to cache pressure; pool 'cephfs_data' is full monmap e3: 3 mons at {gcloudnet=xxx.xxx.xxx.xxx:6789/0,gcloudsrv1=xxx.xxx.xxx.xxx:6789/0,gcloudsrv2=xxx.xxx.xxx.xxx:6789/0}, election epoch 178, quorum 0,1,2 gcloudnet,gcloudsrv1,gcloudsrv2 mdsmap e51000: 2/2/2 up {0=gcloudsrv1=up:active,1=gcloudnet=up:active} osdmap e2821: 18 osds: 18 up, 18 in pgmap v10457877: 3648 pgs, 23 pools, 10501 GB data, 38688 kobjects 14097 GB used, 117 TB / 130 TB avail 6 active+clean+scrubbing+deep 3513 active+clean 128 active+clean+inconsistent 1 active+clean+scrubbing P.S. I am increasing the pg and pgp numbers for cephfs_data pool. Thanks, Erming -- -------- Erming Pei, Ph.D, Senior System Analyst HPC Grid/Cloud Specialist, ComputeCanada/WestGrid Research Computing Group, IST University of Alberta, Canada T6G 2H1 Email: erm...@ualberta.ca erming@cern.ch Tel. : +1 7804929914Fax: +1 7804921729 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: scrub error with ceph
(Found no response from the current list, so forwarded to ceph-us...@ceph.com. ) Sorry if it's duplicated. Original Message Subject:scrub error with ceph Date: Mon, 7 Dec 2015 14:15:07 -0700 From: Erming Pei To: ceph-users@lists.ceph.com Hi, I found there are 128 scrub errors in my ceph system. Checked with health detail and found many pgs with stuck unclean issue. Should I repair all of them? Or what I should do? [root@gcloudnet ~]# ceph -s cluster a4d0879f-abdc-4f9d-8a4b-53ce57d822f1 health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; mds1: Client HTRC:cephfs_data failing to respond to cache pressure; mds0: Client physics-007:cephfs_data failing to respond to cache pressure; pool 'cephfs_data' is full monmap e3: 3 mons at {gcloudnet=xxx.xxx.xxx.xxx:6789/0,gcloudsrv1=xxx.xxx.xxx.xxx:6789/0,gcloudsrv2=xxx.xxx.xxx.xxx:6789/0}, election epoch 178, quorum 0,1,2 gcloudnet,gcloudsrv1,gcloudsrv2 mdsmap e51000: 2/2/2 up {0=gcloudsrv1=up:active,1=gcloudnet=up:active} osdmap e2821: 18 osds: 18 up, 18 in pgmap v10457877: 3648 pgs, 23 pools, 10501 GB data, 38688 kobjects 14097 GB used, 117 TB / 130 TB avail 6 active+clean+scrubbing+deep 3513 active+clean 128 active+clean+inconsistent 1 active+clean+scrubbing P.S. I am increasing the pg and pgp numbers for cephfs_data pool. Thanks, Erming -- -------- Erming Pei, Ph.D, Senior System Analyst HPC Grid/Cloud Specialist, ComputeCanada/WestGrid Research Computing Group, IST University of Alberta, Canada T6G 2H1 Email:erm...@ualberta.ca <mailto:erm...@ualberta.ca>erming@cern.ch <mailto:erming@cern.ch> Tel. :+1 7804929914 Fax:+1 7804921729 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com