[ceph-users] PCIE-SSD OSD bottom performance issue
dear ALL: I used PCIE-SSD to OSD disk . But I found it very bottom performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd by PCIE-SSD. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.35999 root default -2 0.17999 host tds_node03 0 0.17999 osd.0 up 1.0 1.0 -30.17999host tds_node04 1 0.17999 osd.1 up 1.0 1.0 I create pool and rbd device. I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS, I have tried many osd thread parameters, but not effect. But i tested 8K randrw(70%) in single PCIE-SSD, it has 10W IOPS. Is there any way to improve the PCIE-SSD OSD performance? scott_tan...@yahoo.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performances in recovery
Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be "fixed" by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in "what" is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: > Also, check if scrubbing started in the cluster or not. That may > considerably slow down the cluster. > > -Original Message- > From: Somnath Roy > Sent: Wednesday, August 19, 2015 1:35 PM > To: 'J-P Methot'; ceph-us...@ceph.com > Subject: RE: [ceph-users] Bad performances in recovery > > All the writes will go through the journal. > It may happen your SSDs are not preconditioned well and after a lot of > writes during recovery IOs are stabilized to lower number. This is quite > common for SSDs if that is the case. > > Thanks & Regards > Somnath > > -Original Message- > From: J-P Methot [mailto:jpmet...@gtcomm.net] > Sent: Wednesday, August 19, 2015 1:03 PM > To: Somnath Roy; ceph-us...@ceph.com > Subject: Re: [ceph-users] Bad performances in recovery > > Hi, > > Thank you for the quick reply. However, we do have those exact settings > for recovery and it still strongly affects client io. I have looked at > various ceph logs and osd logs and nothing is out of the ordinary. > Here's an idea though, please tell me if I am wrong. > > We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was > explained several times on this mailing list, Samsung SSDs suck in ceph. > They have horrible O_dsync speed and die easily, when used as journal. > That's why we're using Intel ssds for journaling, so that we didn't end > up putting 96 samsung SSDs in the trash. > > In recovery though, what is the ceph behaviour? What kind of write does > it do on the OSD SSDs? Does it write directly to the SSDs or through the > journal? > > Additionally, something else we notice: the ceph cluster is MUCH slower > after recovery than before. Clearly there is a bottleneck somewhere and > that bottleneck does not get cleared up after the recovery is done. > > > On 2015-08-19 3:32 PM, Somnath Roy wrote: > > If you are concerned about *client io performance* during recovery, > > use these settings.. > > > > osd recovery max active = 1 > > osd max backfills = 1 > > osd recovery threads = 1 > > osd recovery op priority = 1 > > > > If you are concerned about *recovery performance*, you may want to > > bump this up, but I doubt it will help much from default settings.. > > > > Thanks & Regards > > Somnath > > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > > Of J-P Methot > > Sent: Wednesday, August 19, 2015 12:17 PM > > To: ceph-us...@ceph.com > > Subject: [ceph-users] Bad performances in recovery > > > > Hi, > > > > Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for > > a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. > > The ceph version is hammer v0.94.1 . There is a performance overhead > > because we're using SSDs (I've heard it gets better in infernalis, but > > we're not upgrading just yet) but we can reach numbers that I would > > consider "alright". > > > > Now, the issue is, when the cluster goes into recovery it's very fast > > at first, but then slows down to ridiculous levels as it moves > > forward. You can go from 7% to 2% to recover in ten minutes, but it > > may take 2 hours to recover the last 2%. While this happens, the > > attached openstack setup becomes incredibly slow, even though there is > > only a small fraction of objects still recovering (less than 1%). The > > settings that may affect recovery speed are very low, as they are by > > default, yet they still affect client io speed way more than it should. > > > > Why would ceph recovery become so slow as it progress and affect > > client io even though it's recovering at a snail's pace? And by a > > snail's pace, I mean a few kb/second on 10gbps uplinks. -- > > == Jean-Philippe Méthot > > Administrateur système / System administrator GloboTech Communications > > Phone: 1-514-907-0050 > > Toll Free: 1-(888)-GTCOMM1 > > Fax: 1-(514)-907-0750 > > jpmet...@gtcomm.net > > http://www.gtcomm.net > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > PLEASE NOTE: The information contained in t
Re: [ceph-users] Ceph OSD nodes in XenServer VMs
Hello, On Thu, 20 Aug 2015 11:55:55 +1000 Jiri Kanicky wrote: > Hi all, > > We are experimenting with an idea to run OSD nodes in XenServer VMs. We > believe this could provide better flexibility, backups for the nodes etc. > > For example: > Xenserver with 4 HDDs dedicated for Ceph. > We would introduce 1 VM (OSD node) with raw/direct access to 4 HDDs or 2 > VMs (2 OSD nodes) with 2 HDDs each. > > Do you have any experience with this? Any thoughts on this? Good or bad > idea? > My knee jerk reaction would be definitely in the "bad idea" category. Even with "raw" access I'd venture that isn't as fast as actual bare-metal and your network will be virtualized anyway. What really puzzles/amuses me though is using the one VM platform that doesn't support Ceph as the basis for providing Ceph. ^o^ Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd debug question / proposal
Just to clarify - you unmounted the filesystem with "umount -l"? That almost never a good idea, and it puts the OSD in a very unusual situation where IO will actually work on the open files, but it can't open any new ones. I think this would be enough to confuse just about any piece of software. Was journal on the filesystem or on a separate partition/device? It's not the same as R/O filesystem (I hit that once and no such havoc happened), in my experience the OSD traps and exits when something like that happens. It would be interesting to know what would happen if you just did rm -rf /var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount -l, more or less :-) Jan > On 20 Aug 2015, at 08:01, Goncalo Borges wrote: > > Dear Ceph gurus... > > Just wanted to report something that may be interesting to enhance... or > maybe I am not doing the right debugging procedure. > > 1. I am working with 0.92.2 and I am testing the cluster in several disaster > catastrophe scenarios. > > 2. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per > server. > > 3. I have deliberately unmounted the filesystem of osd.4 but the daemon was > left on. I just wanted to understand how the system would react. This was > what happened: > a. While there was no I/0, the system did not realized that the osd-4 > filesystem was not mounted, and the 'ceph -s' continues to report HEALTH_OK > for the system status. > > b. When I've started to impose some heavy I/O, the system started to complain > of slow requests. Curiously, osd.4 never appears in the logs. > # ceph -s > cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc > health HEALTH_WARN > 170 requests are blocked > 32 sec > monmap e1: 3 mons at > {rccephmon1=192.231.127.8:6789/0,rccephmon2=192.231.127.34:6789/0,rccephmon3=192.231.127.26:6789/0} > election epoch 24, quorum 0,1,2 rccephmon1,rccephmon3,rccephmon2 > mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay > osdmap e1179: 32 osds: 32 up, 32 in > pgmap v907325: 2176 pgs, 2 pools, 4928 GB data, 1843 kobjects > 14823 GB used, 74228 GB / 89051 GB avail > 2174 active+clean >2 active+clean+replay > > # ceph -w > (...) > 2015-08-19 17:44:55.161731 osd.1 [WRN] 88 slow requests, 8 included below; > oldest blocked for > 3156.325716 secs > 2015-08-19 17:44:55.161940 osd.1 [WRN] slow request 1920.533342 seconds old, > received at 2015-08-19 17:12:54.628258: osd_op(client.44544.1:2266980 > 100022a.6aec [write 524288~524288 [1@-1]] 5.e0cf740e snapc 1=[] > ondisk+write e1171) currently waiting for replay end > 2015-08-19 17:44:55.161950 osd.1 [WRN] slow request 1920.511098 seconds old, > received at 2015-08-19 17:12:54.650502: osd_op(client.44544.1:2266988 > 100022a.6aec [write 1048576~524288 [1@-1]] 5.e0cf740e snapc 1=[] > ondisk+write e1171) currently waiting for replay end > 2015-08-19 17:44:55.161957 osd.1 [WRN] slow request 1920.510451 seconds old, > received at 2015-08-19 17:12:54.651149: osd_op(client.44544.1:2266996 > 100022a.6aec [write 1572864~524288 [1@-1]] 5.e0cf740e snapc 1=[] > ondisk+write e1171) currently waiting for replay end > 2015-08-19 17:44:55.161963 osd.1 [WRN] slow request 1920.488589 seconds old, > received at 2015-08-19 17:12:54.673011: osd_op(client.44544.1:2267004 > 100022a.6aec [write 2097152~524288 [1@-1]] 5.e0cf740e snapc 1=[] > ondisk+write e1171) currently waiting for replay end > 2015-08-19 17:44:55.161970 osd.1 [WRN] slow request 1920.482785 seconds old, > received at 2015-08-19 17:12:54.678815: osd_op(client.44544.1:2267012 > 100022a.6aec [write 2621440~524288 [1@-1]] 5.e0cf740e snapc 1=[] > ondisk+write e1171) currently waiting for replay end > (...) > # grep "slow requests" /tmp/osd_failed.txt | awk '{print $3}' | sort | uniq > osd.1 > osd.11 > osd.17 > osd.23 > osd.24 > osd.26 > osd.27 > osd.31 > osd.7 > > c. None of the standard 'ceph osd' commands indicated that the problematic > OSD was osd.4. Only looking to ceph-osd.4.log, we find write error messages: > 015-08-19 16:52:17.552512 7f6f69973700 0 -- 10.100.1.167:6809/23763 >> > 10.100.1.169:6800/28352 pipe(0x175ca000 sd=169 :6809 s=0 pgs=0 cs=0 l=0 > c=0x1f038000).accept connect_seq 180 vs existing 179 state standby > 2015-08-19 16:52:17.566701 7f6f89d2a700 -1 > filestore(/var/lib/ceph/osd/ceph-4) could not find > e6f81180/100022a.0030/head//5 in index: (2) No such file or directory > 2015-08-19 16:52:17.567230 7f6f89d2a700 0 > filestore(/var/lib/ceph/osd/ceph-4) write couldn't open > 5.180_head/e6f81180/100022a.0030/head//5: (2) No such file or > directory > 2015-08-19 16:52:17.567332 7f6f89d2a700 -1 > filestore(/var/lib/ceph/osd/ceph-4) could not find > e6f81180/100022a.0030/head//5 in index: (2) No such file or dire
[ceph-users] Testing CephFS
Hey all, We are currently testing CephFS on a small (3 node) cluster. The setup is currently: Each server has 12 OSDs, 1 Monitor and 1 MDS running on it: The servers are running: 0.94.2-0.el7 The clients are running: Ceph: 0.80.10-1.fc21, Kernel: 4.0.6-200.fc21.x86_64 ceph -s cluster 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd health HEALTH_OK monmap e1: 3 mons at {ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0} election epoch 20, quorum 0,1,2 ceph1,ceph2,ceph3 mdsmap e12: 1/1/1 up {0=ceph3=up:active}, 2 up:standby osdmap e389: 36 osds: 36 up, 36 in pgmap v19370: 8256 pgs, 3 pools, 51217 MB data, 14035 objects 95526 MB used, 196 TB / 196 TB avail 8256 active+clean Our Ceph.conf is relatively simple at the moment: cat /etc/ceph/ceph.conf [global] fsid = 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.15.0.1,10.15.0.2,10.15.0.3 mon_pg_warn_max_per_osd = 1000 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 When I pulled the plug on the master MDS last time (ceph1), it stopped all IO until I plugged it back in. I was under the assumption that the MDS would fail over the other 2 MDS's and IO would continue? Is there something I need to do to allow the MDS's to failover from each other without too much interruption? Or is this because the clients ceph version? Cheers, Simon Hallam Linux Support & Development Officer Please visit our new website at www.pml.ac.uk and follow us on Twitter @PlymouthMarine Winner of the Environment & Conservation category, the Charity Awards 2014. Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth PL1 3DH, UK. This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o - attached 2015-08-18 18:48 GMT+03:00 Samuel Just : > Is the number of inconsistent objects growing? Can you attach the > whole ceph.log from the 6 hours before and after the snippet you > linked above? Are you using cache/tiering? Can you attach the osdmap > (ceph osd getmap -o )? > -Sam > > On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor > wrote: > > ceph - 0.94.2 > > Its happen during rebalancing > > > > I thought too, that some OSD miss copy, but looks like all miss... > > So any advice in which direction i need to go > > > > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : > >> > >> From a quick peek it looks like some of the OSDs are missing clones of > >> objects. I'm not sure how that could happen and I'd expect the pg > >> repair to handle that but if it's not there's probably something > >> wrong; what version of Ceph are you running? Sam, is this something > >> you've seen, a new bug, or some kind of config issue? > >> -Greg > >> > >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor > >> wrote: > >> > Hi all, at our production cluster, due high rebalancing ((( we have 2 > >> > pgs in > >> > inconsistent state... > >> > > >> > root@temp:~# ceph health detail | grep inc > >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> > > >> > From OSD logs, after recovery attempt: > >> > > >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read > i; do > >> > ceph pg repair ${i} ; done > >> > dumped all in format plain > >> > instructing pg 2.490 on osd.56 to repair > >> > instructing pg 2.c4 on osd.56 to repair > >> > > >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected > clone > >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 > >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected > clone > >> > f5759490/rbd_data.1631755377d7e.04da/141//2 > >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected > clone > >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 > >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected > clone > >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 > >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected > clone > >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2 > >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected > clone > >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2 > >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected > clone > >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 > >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> > e1509490/rbd_data.1423897545e146.09a6/head//2 expected > clone > >> > 28809490/rbd_data.edea7460fe42b.01d9/141//2 > >> > /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 > 7f94663b3700 > >> > -1 > >> > log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors > >> > > >> > So, how i can solve "expected clone" situation by hand? > >> > Thank in advance! > >> > > >> > > >> > > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > osdmap
Re: [ceph-users] Ceph File System ACL Support
The code is at https://github.com/ceph/samba.git wip-acl. So far the code does not handle default ACL (files created by samba do not inherit parent directory's default ACL) Regards Yan, Zheng On Tue, Aug 18, 2015 at 6:57 PM, Gregory Farnum wrote: > On Mon, Aug 17, 2015 at 4:12 AM, Yan, Zheng wrote: >> On Mon, Aug 17, 2015 at 9:38 AM, Eric Eastman >> wrote: >>> Hi, >>> >>> I need to verify in Ceph v9.0.2 if the kernel version of Ceph file >>> system supports ACLs and the libcephfs file system interface does not. >>> I am trying to have SAMBA, version 4.3.0rc1, support Windows ACLs >>> using "vfs objects = acl_xattr" with the SAMBA VFS Ceph file system >>> interface "vfs objects = ceph" and my tests are failing. If I use a >>> kernel mount of the same Ceph file system, it works. Using the SAMBA >>> Ceph VFS interface with logging set to 3 in my smb.conf files shows >>> the following error when on my Windows AD server I try to "Disable >>> inheritance" of the SAMBA exported directory uu/home: >>> >>> [2015/08/16 18:27:11.546307, 2] >>> ../source3/smbd/posix_acls.c:3006(set_canon_ace_list) >>> set_canon_ace_list: sys_acl_set_file type file failed for file >>> uu/home (Operation not supported). >>> >>> This works using the same Ceph file system kernel mounted. It also >>> works with an XFS file system. >>> >>> Doing some Googling I found this entry on the SAMBA email list: >>> >>> https://lists.samba.org/archive/samba-technical/2015-March/106699.html >>> >>> It states: libcephfs does not support ACL yet, so this patch adds ACL >>> callbacks that do nothing. >>> >>> If ACL support is not in libcephfs, is there plans to add it, as the >>> SAMBA Ceph VFS interface without ACL support is severely limited in a >>> multi-user Windows environment. >>> >> >> libcephfs does not support ACL. I have an old patch that adds ACL >> support to samba's vfs ceph module, but haven't tested it carefully. > > Are these published somewhere? Even if you don't have time to work on > it somebody else might pick it up and finish things if it's available > as a starting point. :) > -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performances in recovery
Hi, Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging everything down. On 2015-08-20 4:06 AM, Christian Balzer wrote: > > Hello, > > from all the pertinent points by Somnath, the one about pre-conditioning > would be pretty high on my list, especially if this slowness persists and > nothing else (scrub) is going on. > > This might be "fixed" by doing a fstrim. > > Additionally the levelDB's per OSD are of course sync'ing heavily during > reconstruction, so that might not be the favorite thing for your type of > SSDs. > > But ultimately situational awareness is very important, as in "what" is > actually going and slowing things down. > As usual my recommendations would be to use atop, iostat or similar on all > your nodes and see if your OSD SSDs are indeed the bottleneck or if it is > maybe just one of them or something else entirely. > > Christian > > On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: > >> Also, check if scrubbing started in the cluster or not. That may >> considerably slow down the cluster. >> >> -Original Message- >> From: Somnath Roy >> Sent: Wednesday, August 19, 2015 1:35 PM >> To: 'J-P Methot'; ceph-us...@ceph.com >> Subject: RE: [ceph-users] Bad performances in recovery >> >> All the writes will go through the journal. >> It may happen your SSDs are not preconditioned well and after a lot of >> writes during recovery IOs are stabilized to lower number. This is quite >> common for SSDs if that is the case. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: J-P Methot [mailto:jpmet...@gtcomm.net] >> Sent: Wednesday, August 19, 2015 1:03 PM >> To: Somnath Roy; ceph-us...@ceph.com >> Subject: Re: [ceph-users] Bad performances in recovery >> >> Hi, >> >> Thank you for the quick reply. However, we do have those exact settings >> for recovery and it still strongly affects client io. I have looked at >> various ceph logs and osd logs and nothing is out of the ordinary. >> Here's an idea though, please tell me if I am wrong. >> >> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was >> explained several times on this mailing list, Samsung SSDs suck in ceph. >> They have horrible O_dsync speed and die easily, when used as journal. >> That's why we're using Intel ssds for journaling, so that we didn't end >> up putting 96 samsung SSDs in the trash. >> >> In recovery though, what is the ceph behaviour? What kind of write does >> it do on the OSD SSDs? Does it write directly to the SSDs or through the >> journal? >> >> Additionally, something else we notice: the ceph cluster is MUCH slower >> after recovery than before. Clearly there is a bottleneck somewhere and >> that bottleneck does not get cleared up after the recovery is done. >> >> >> On 2015-08-19 3:32 PM, Somnath Roy wrote: >>> If you are concerned about *client io performance* during recovery, >>> use these settings.. >>> >>> osd recovery max active = 1 >>> osd max backfills = 1 >>> osd recovery threads = 1 >>> osd recovery op priority = 1 >>> >>> If you are concerned about *recovery performance*, you may want to >>> bump this up, but I doubt it will help much from default settings.. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >>> Of J-P Methot >>> Sent: Wednesday, August 19, 2015 12:17 PM >>> To: ceph-us...@ceph.com >>> Subject: [ceph-users] Bad performances in recovery >>> >>> Hi, >>> >>> Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for >>> a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. >>> The ceph version is hammer v0.94.1 . There is a performance overhead >>> because we're using SSDs (I've heard it gets better in infernalis, but >>> we're not upgrading just yet) but we can reach numbers that I would >>> consider "alright". >>> >>> Now, the issue is, when the cluster goes into recovery it's very fast >>> at first, but then slows down to ridiculous levels as it moves >>> forward. You can go from 7% to 2% to recover in ten minutes, but it >>> may take 2 hours to recover the last 2%. While this happens, the >>> attached openstack setup becomes incredibly slow, even though there is >>> only a small fraction of objects still recovering (less than 1%). The >>> settings that may affect recovery speed are very low, as they are by >>> default, yet they still affect client io speed way more than it should. >>> >>> Why would ceph recovery become so slow as it progress and affect >>> client io even though it's recovering at a snail's pace? And by a >>> snail's pace, I mean a few kb/second on 10gbps uplinks. -- >>> == Jean-Philippe Méthot >>> Administrateur système / System admi
Re: [ceph-users] Bad performances in recovery
> > Just to update the mailing list, we ended up going back to default > ceph.conf without any additional settings than what is mandatory. We are > now reaching speeds we never reached before, both in recovery and in > regular usage. There was definitely something we set in the ceph.conf > bogging everything down. Could you please share the old and new ceph.conf, or the section that was removed? Best regards, Alex > > > On 2015-08-20 4:06 AM, Christian Balzer wrote: >> >> Hello, >> >> from all the pertinent points by Somnath, the one about pre-conditioning >> would be pretty high on my list, especially if this slowness persists and >> nothing else (scrub) is going on. >> >> This might be "fixed" by doing a fstrim. >> >> Additionally the levelDB's per OSD are of course sync'ing heavily during >> reconstruction, so that might not be the favorite thing for your type of >> SSDs. >> >> But ultimately situational awareness is very important, as in "what" is >> actually going and slowing things down. >> As usual my recommendations would be to use atop, iostat or similar on all >> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is >> maybe just one of them or something else entirely. >> >> Christian >> >> On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: >> >>> Also, check if scrubbing started in the cluster or not. That may >>> considerably slow down the cluster. >>> >>> -Original Message- >>> From: Somnath Roy >>> Sent: Wednesday, August 19, 2015 1:35 PM >>> To: 'J-P Methot'; ceph-us...@ceph.com >>> Subject: RE: [ceph-users] Bad performances in recovery >>> >>> All the writes will go through the journal. >>> It may happen your SSDs are not preconditioned well and after a lot of >>> writes during recovery IOs are stabilized to lower number. This is quite >>> common for SSDs if that is the case. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: J-P Methot [mailto:jpmet...@gtcomm.net] >>> Sent: Wednesday, August 19, 2015 1:03 PM >>> To: Somnath Roy; ceph-us...@ceph.com >>> Subject: Re: [ceph-users] Bad performances in recovery >>> >>> Hi, >>> >>> Thank you for the quick reply. However, we do have those exact settings >>> for recovery and it still strongly affects client io. I have looked at >>> various ceph logs and osd logs and nothing is out of the ordinary. >>> Here's an idea though, please tell me if I am wrong. >>> >>> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was >>> explained several times on this mailing list, Samsung SSDs suck in ceph. >>> They have horrible O_dsync speed and die easily, when used as journal. >>> That's why we're using Intel ssds for journaling, so that we didn't end >>> up putting 96 samsung SSDs in the trash. >>> >>> In recovery though, what is the ceph behaviour? What kind of write does >>> it do on the OSD SSDs? Does it write directly to the SSDs or through the >>> journal? >>> >>> Additionally, something else we notice: the ceph cluster is MUCH slower >>> after recovery than before. Clearly there is a bottleneck somewhere and >>> that bottleneck does not get cleared up after the recovery is done. >>> >>> >>> On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks & Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider "alright". Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect cl
Re: [ceph-users] Bad performances in recovery
Are you sure it was because of configuration changes? Maybe it was restarting the OSDs that fixed it? We often hit an issue with backfill_toofull where the recovery/backfill processes get stuck until we restart the daemons (sometimes setting recovery_max_active helps as well). It still shows recovery of few objects now and then (few KB/s) and then stops completely. Jan > On 20 Aug 2015, at 17:43, Alex Gorbachev wrote: > >> >> Just to update the mailing list, we ended up going back to default >> ceph.conf without any additional settings than what is mandatory. We are >> now reaching speeds we never reached before, both in recovery and in >> regular usage. There was definitely something we set in the ceph.conf >> bogging everything down. > > Could you please share the old and new ceph.conf, or the section that > was removed? > > Best regards, > Alex > >> >> >> On 2015-08-20 4:06 AM, Christian Balzer wrote: >>> >>> Hello, >>> >>> from all the pertinent points by Somnath, the one about pre-conditioning >>> would be pretty high on my list, especially if this slowness persists and >>> nothing else (scrub) is going on. >>> >>> This might be "fixed" by doing a fstrim. >>> >>> Additionally the levelDB's per OSD are of course sync'ing heavily during >>> reconstruction, so that might not be the favorite thing for your type of >>> SSDs. >>> >>> But ultimately situational awareness is very important, as in "what" is >>> actually going and slowing things down. >>> As usual my recommendations would be to use atop, iostat or similar on all >>> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is >>> maybe just one of them or something else entirely. >>> >>> Christian >>> >>> On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: >>> Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks & Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: > If you are concerned about *client io performance* during recovery, > use these settings.. > > osd recovery max active = 1 > osd max backfills = 1 > osd recovery threads = 1 > osd recovery op priority = 1 > > If you are concerned about *recovery performance*, you may want to > bump this up, but I doubt it will help much from default settings.. > > Thanks & Regards > Somnath > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > Of J-P Methot > Sent: Wednesday, August 19, 2015 12:17 PM > To: ceph-us...@ceph.com > Subject: [ceph-users] Bad performances in recovery > > Hi, > > Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for > a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. > The ceph version is hammer v0.94.1 . There is a performance overhead > because we're using SSDs (I've heard it gets better in infernalis, but > we're not upgrading just yet) but we can reach numbers that I would > consider "alright". > > Now, the issue is, when the cluster goes into recovery it's very fast > at fir
Re: [ceph-users] Repair inconsistent pgs..
Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor wrote: > Hi Samuel, we try to fix it in trick way. > > we check all rbd_data chunks from logs (OSD) which are affected, then query > rbd info to compare which rbd consist bad rbd_data, after that we mount this > rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. > > But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos > try to out OSD which was lead, but after rebalancing this 2 pgs still have > 35 scrub errors... > > ceph osd getmap -o - attached > > > 2015-08-18 18:48 GMT+03:00 Samuel Just : >> >> Is the number of inconsistent objects growing? Can you attach the >> whole ceph.log from the 6 hours before and after the snippet you >> linked above? Are you using cache/tiering? Can you attach the osdmap >> (ceph osd getmap -o )? >> -Sam >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >> wrote: >> > ceph - 0.94.2 >> > Its happen during rebalancing >> > >> > I thought too, that some OSD miss copy, but looks like all miss... >> > So any advice in which direction i need to go >> > >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : >> >> >> >> From a quick peek it looks like some of the OSDs are missing clones of >> >> objects. I'm not sure how that could happen and I'd expect the pg >> >> repair to handle that but if it's not there's probably something >> >> wrong; what version of Ceph are you running? Sam, is this something >> >> you've seen, a new bug, or some kind of config issue? >> >> -Greg >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >> >> wrote: >> >> > Hi all, at our production cluster, due high rebalancing ((( we have 2 >> >> > pgs in >> >> > inconsistent state... >> >> > >> >> > root@temp:~# ceph health detail | grep inc >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> >> > >> >> > From OSD logs, after recovery attempt: >> >> > >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; >> >> > do >> >> > ceph pg repair ${i} ; done >> >> > dumped all in format plain >> >> > instructing pg 2.490 on osd.56 to repair >> >> > instructing pg 2.c4 on osd.56 to repair >> >> > >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected >> >> > clone >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected >> >> > clone >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected >> >> > clone >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected >> >> > clone >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected >> >> > clone >> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected >> >> > clone >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected >> >> > clone >> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 >> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 >> >> > 7f94663b3700 >> >> > -1 >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> > e1509490/rbd_data.1423897545e146.09a6/head//2 expected >> >> > clone >> >> > 28809490/rbd_d
Re: [ceph-users] Repair inconsistent pgs..
Also, was there at any point a power failure/power cycle event, perhaps on osd 56? -Sam On Thu, Aug 20, 2015 at 9:23 AM, Samuel Just wrote: > Ok, you appear to be using a replicated cache tier in front of a > replicated base tier. Please scrub both inconsistent pgs and post the > ceph.log from before when you started the scrub until after. Also, > what command are you using to take snapshots? > -Sam > > On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > wrote: >> Hi Samuel, we try to fix it in trick way. >> >> we check all rbd_data chunks from logs (OSD) which are affected, then query >> rbd info to compare which rbd consist bad rbd_data, after that we mount this >> rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. >> >> But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos >> try to out OSD which was lead, but after rebalancing this 2 pgs still have >> 35 scrub errors... >> >> ceph osd getmap -o - attached >> >> >> 2015-08-18 18:48 GMT+03:00 Samuel Just : >>> >>> Is the number of inconsistent objects growing? Can you attach the >>> whole ceph.log from the 6 hours before and after the snippet you >>> linked above? Are you using cache/tiering? Can you attach the osdmap >>> (ceph osd getmap -o )? >>> -Sam >>> >>> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >>> wrote: >>> > ceph - 0.94.2 >>> > Its happen during rebalancing >>> > >>> > I thought too, that some OSD miss copy, but looks like all miss... >>> > So any advice in which direction i need to go >>> > >>> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : >>> >> >>> >> From a quick peek it looks like some of the OSDs are missing clones of >>> >> objects. I'm not sure how that could happen and I'd expect the pg >>> >> repair to handle that but if it's not there's probably something >>> >> wrong; what version of Ceph are you running? Sam, is this something >>> >> you've seen, a new bug, or some kind of config issue? >>> >> -Greg >>> >> >>> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >>> >> wrote: >>> >> > Hi all, at our production cluster, due high rebalancing ((( we have 2 >>> >> > pgs in >>> >> > inconsistent state... >>> >> > >>> >> > root@temp:~# ceph health detail | grep inc >>> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >>> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] >>> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >>> >> > >>> >> > From OSD logs, after recovery attempt: >>> >> > >>> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; >>> >> > do >>> >> > ceph pg repair ${i} ; done >>> >> > dumped all in format plain >>> >> > instructing pg 2.490 on osd.56 to repair >>> >> > instructing pg 2.c4 on osd.56 to repair >>> >> > >>> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected >>> >> > clone >>> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected >>> >> > clone >>> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected >>> >> > clone >>> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected >>> >> > clone >>> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected >>> >> > clone >>> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected >>> >> > clone >>> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected >>> >> > clone >>> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 >>> >
Re: [ceph-users] Repair inconsistent pgs..
Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just : > Ok, you appear to be using a replicated cache tier in front of a > replicated base tier. Please scrub both inconsistent pgs and post the > ceph.log from before when you started the scrub until after. Also, > what command are you using to take snapshots? > -Sam > > On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > wrote: > > Hi Samuel, we try to fix it in trick way. > > > > we check all rbd_data chunks from logs (OSD) which are affected, then > query > > rbd info to compare which rbd consist bad rbd_data, after that we mount > this > > rbd as rbd0, create empty rbd, and DD all info from bad volume to new > one. > > > > But after that - scrub errors growing... Was 15 errors.. .Now 35... We > laos > > try to out OSD which was lead, but after rebalancing this 2 pgs still > have > > 35 scrub errors... > > > > ceph osd getmap -o - attached > > > > > > 2015-08-18 18:48 GMT+03:00 Samuel Just : > >> > >> Is the number of inconsistent objects growing? Can you attach the > >> whole ceph.log from the 6 hours before and after the snippet you > >> linked above? Are you using cache/tiering? Can you attach the osdmap > >> (ceph osd getmap -o )? > >> -Sam > >> > >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor > >> wrote: > >> > ceph - 0.94.2 > >> > Its happen during rebalancing > >> > > >> > I thought too, that some OSD miss copy, but looks like all miss... > >> > So any advice in which direction i need to go > >> > > >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : > >> >> > >> >> From a quick peek it looks like some of the OSDs are missing clones > of > >> >> objects. I'm not sure how that could happen and I'd expect the pg > >> >> repair to handle that but if it's not there's probably something > >> >> wrong; what version of Ceph are you running? Sam, is this something > >> >> you've seen, a new bug, or some kind of config issue? > >> >> -Greg > >> >> > >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor > >> >> wrote: > >> >> > Hi all, at our production cluster, due high rebalancing ((( we > have 2 > >> >> > pgs in > >> >> > inconsistent state... > >> >> > > >> >> > root@temp:~# ceph health detail | grep inc > >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> >> > > >> >> > From OSD logs, after recovery attempt: > >> >> > > >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while > read i; > >> >> > do > >> >> > ceph pg repair ${i} ; done > >> >> > dumped all in format plain > >> >> > instructing pg 2.490 on osd.56 to repair > >> >> > instructing pg 2.c4 on osd.56 to repair > >> >> > > >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected > >> >> > clone > >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 > >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected > >> >> > clone > >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 > >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected > >> >> > clone > >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 > >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected > >> >> > clone > >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 > >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected > >> >> > clone > >> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2 > >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected > >> >> > clone > >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2 > >> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 > >> >> > 7f94663b3700 > >> >> > -1 > >> >> > log_channel(cluster) log [ERR] : deep-scrub
Re: [ceph-users] Repair inconsistent pgs..
What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor wrote: > Samuel, we turned off cache layer few hours ago... > I will post ceph.log in few minutes > > For snap - we found issue, was connected with cache tier.. > > 2015-08-20 19:23 GMT+03:00 Samuel Just : >> >> Ok, you appear to be using a replicated cache tier in front of a >> replicated base tier. Please scrub both inconsistent pgs and post the >> ceph.log from before when you started the scrub until after. Also, >> what command are you using to take snapshots? >> -Sam >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor >> wrote: >> > Hi Samuel, we try to fix it in trick way. >> > >> > we check all rbd_data chunks from logs (OSD) which are affected, then >> > query >> > rbd info to compare which rbd consist bad rbd_data, after that we mount >> > this >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new >> > one. >> > >> > But after that - scrub errors growing... Was 15 errors.. .Now 35... We >> > laos >> > try to out OSD which was lead, but after rebalancing this 2 pgs still >> > have >> > 35 scrub errors... >> > >> > ceph osd getmap -o - attached >> > >> > >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : >> >> >> >> Is the number of inconsistent objects growing? Can you attach the >> >> whole ceph.log from the 6 hours before and after the snippet you >> >> linked above? Are you using cache/tiering? Can you attach the osdmap >> >> (ceph osd getmap -o )? >> >> -Sam >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >> >> wrote: >> >> > ceph - 0.94.2 >> >> > Its happen during rebalancing >> >> > >> >> > I thought too, that some OSD miss copy, but looks like all miss... >> >> > So any advice in which direction i need to go >> >> > >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : >> >> >> >> >> >> From a quick peek it looks like some of the OSDs are missing clones >> >> >> of >> >> >> objects. I'm not sure how that could happen and I'd expect the pg >> >> >> repair to handle that but if it's not there's probably something >> >> >> wrong; what version of Ceph are you running? Sam, is this something >> >> >> you've seen, a new bug, or some kind of config issue? >> >> >> -Greg >> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >> >> >> wrote: >> >> >> > Hi all, at our production cluster, due high rebalancing ((( we >> >> >> > have 2 >> >> >> > pgs in >> >> >> > inconsistent state... >> >> >> > >> >> >> > root@temp:~# ceph health detail | grep inc >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> >> >> > >> >> >> > From OSD logs, after recovery attempt: >> >> >> > >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read >> >> >> > i; >> >> >> > do >> >> >> > ceph pg repair ${i} ; done >> >> >> > dumped all in format plain >> >> >> > instructing pg 2.490 on osd.56 to repair >> >> >> > instructing pg 2.c4 on osd.56 to repair >> >> >> > >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 >> >> >> > 7f94663b3700 >> >> >> > -1 >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected >> >> >> > clone >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 >> >> >> > 7f94663b3700 >> >> >> > -1 >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected >> >> >> > clone >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 >> >> >> > 7f94663b3700 >> >> >> > -1 >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected >> >> >> > clone >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 >> >> >> > 7f94663b3700 >> >> >> > -1 >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected >> >> >> > clone >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 >> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 >> >> >> > 7f94663b3700 >> >> >> > -1 >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected >> >> >> > clone >> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2 >> >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 >> >> >> > 7f94663b3700 >> >> >> > -1 >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
Re: [ceph-users] Repair inconsistent pgs..
Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just : > What was the issue? > -Sam > > On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor > wrote: > > Samuel, we turned off cache layer few hours ago... > > I will post ceph.log in few minutes > > > > For snap - we found issue, was connected with cache tier.. > > > > 2015-08-20 19:23 GMT+03:00 Samuel Just : > >> > >> Ok, you appear to be using a replicated cache tier in front of a > >> replicated base tier. Please scrub both inconsistent pgs and post the > >> ceph.log from before when you started the scrub until after. Also, > >> what command are you using to take snapshots? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > >> wrote: > >> > Hi Samuel, we try to fix it in trick way. > >> > > >> > we check all rbd_data chunks from logs (OSD) which are affected, then > >> > query > >> > rbd info to compare which rbd consist bad rbd_data, after that we > mount > >> > this > >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new > >> > one. > >> > > >> > But after that - scrub errors growing... Was 15 errors.. .Now 35... We > >> > laos > >> > try to out OSD which was lead, but after rebalancing this 2 pgs still > >> > have > >> > 35 scrub errors... > >> > > >> > ceph osd getmap -o - attached > >> > > >> > > >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : > >> >> > >> >> Is the number of inconsistent objects growing? Can you attach the > >> >> whole ceph.log from the 6 hours before and after the snippet you > >> >> linked above? Are you using cache/tiering? Can you attach the > osdmap > >> >> (ceph osd getmap -o )? > >> >> -Sam > >> >> > >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor > >> >> wrote: > >> >> > ceph - 0.94.2 > >> >> > Its happen during rebalancing > >> >> > > >> >> > I thought too, that some OSD miss copy, but looks like all miss... > >> >> > So any advice in which direction i need to go > >> >> > > >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : > >> >> >> > >> >> >> From a quick peek it looks like some of the OSDs are missing > clones > >> >> >> of > >> >> >> objects. I'm not sure how that could happen and I'd expect the pg > >> >> >> repair to handle that but if it's not there's probably something > >> >> >> wrong; what version of Ceph are you running? Sam, is this > something > >> >> >> you've seen, a new bug, or some kind of config issue? > >> >> >> -Greg > >> >> >> > >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor > >> >> >> wrote: > >> >> >> > Hi all, at our production cluster, due high rebalancing ((( we > >> >> >> > have 2 > >> >> >> > pgs in > >> >> >> > inconsistent state... > >> >> >> > > >> >> >> > root@temp:~# ceph health detail | grep inc > >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> >> >> > > >> >> >> > From OSD logs, after recovery attempt: > >> >> >> > > >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while > read > >> >> >> > i; > >> >> >> > do > >> >> >> > ceph pg repair ${i} ; done > >> >> >> > dumped all in format plain > >> >> >> > instructing pg 2.490 on osd.56 to repair > >> >> >> > instructing pg 2.c4 on osd.56 to repair > >> >> >> > > >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 > >> >> >> > 7f94663b3700 > >> >> >> > -1 > >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 > expected > >> >> >> > clone > >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 > >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 > >> >> >> > 7f94663b3700 > >> >> >> > -1 > >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 > expected > >> >> >> > clone > >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 > >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 > >> >> >> > 7f94663b3700 > >> >> >> > -1 > >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 > expected > >> >> >> > clone > >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 > >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 > >> >> >> > 7f94663b3700 > >> >> >> > -1 > >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 > expected > >> >> >> > clone > >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 > >> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 > >> >> >> > 7f9466
Re: [ceph-users] Repair inconsistent pgs..
Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor wrote: > Issue, that in forward mode, fstrim doesn't work proper, and when we take > snapshot - data not proper update in cache layer, and client (ceph) see > damaged snap.. As headers requested from cache layer. > > 2015-08-20 19:53 GMT+03:00 Samuel Just : >> >> What was the issue? >> -Sam >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor >> wrote: >> > Samuel, we turned off cache layer few hours ago... >> > I will post ceph.log in few minutes >> > >> > For snap - we found issue, was connected with cache tier.. >> > >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : >> >> >> >> Ok, you appear to be using a replicated cache tier in front of a >> >> replicated base tier. Please scrub both inconsistent pgs and post the >> >> ceph.log from before when you started the scrub until after. Also, >> >> what command are you using to take snapshots? >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor >> >> wrote: >> >> > Hi Samuel, we try to fix it in trick way. >> >> > >> >> > we check all rbd_data chunks from logs (OSD) which are affected, then >> >> > query >> >> > rbd info to compare which rbd consist bad rbd_data, after that we >> >> > mount >> >> > this >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new >> >> > one. >> >> > >> >> > But after that - scrub errors growing... Was 15 errors.. .Now 35... >> >> > We >> >> > laos >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs still >> >> > have >> >> > 35 scrub errors... >> >> > >> >> > ceph osd getmap -o - attached >> >> > >> >> > >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : >> >> >> >> >> >> Is the number of inconsistent objects growing? Can you attach the >> >> >> whole ceph.log from the 6 hours before and after the snippet you >> >> >> linked above? Are you using cache/tiering? Can you attach the >> >> >> osdmap >> >> >> (ceph osd getmap -o )? >> >> >> -Sam >> >> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >> >> >> wrote: >> >> >> > ceph - 0.94.2 >> >> >> > Its happen during rebalancing >> >> >> > >> >> >> > I thought too, that some OSD miss copy, but looks like all miss... >> >> >> > So any advice in which direction i need to go >> >> >> > >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : >> >> >> >> >> >> >> >> From a quick peek it looks like some of the OSDs are missing >> >> >> >> clones >> >> >> >> of >> >> >> >> objects. I'm not sure how that could happen and I'd expect the pg >> >> >> >> repair to handle that but if it's not there's probably something >> >> >> >> wrong; what version of Ceph are you running? Sam, is this >> >> >> >> something >> >> >> >> you've seen, a new bug, or some kind of config issue? >> >> >> >> -Greg >> >> >> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >> >> >> >> wrote: >> >> >> >> > Hi all, at our production cluster, due high rebalancing ((( we >> >> >> >> > have 2 >> >> >> >> > pgs in >> >> >> >> > inconsistent state... >> >> >> >> > >> >> >> >> > root@temp:~# ceph health detail | grep inc >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> >> >> >> > >> >> >> >> > From OSD logs, after recovery attempt: >> >> >> >> > >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while >> >> >> >> > read >> >> >> >> > i; >> >> >> >> > do >> >> >> >> > ceph pg repair ${i} ; done >> >> >> >> > dumped all in format plain >> >> >> >> > instructing pg 2.490 on osd.56 to repair >> >> >> >> > instructing pg 2.c4 on osd.56 to repair >> >> >> >> > >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 >> >> >> >> > 7f94663b3700 >> >> >> >> > -1 >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 >> >> >> >> > expected >> >> >> >> > clone >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 >> >> >> >> > 7f94663b3700 >> >> >> >> > -1 >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 >> >> >> >> > expected >> >> >> >> > clone >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 >> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 >> >> >> >> > 7f94663b3700 >> >> >> >> > -1 >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 >> >> >> >> > expected >> >> >> >> > clone >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2 >> >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 >> >> >> >> > 7f94663b3700 >> >> >> >> > -1 >> >> >> >>
Re: [ceph-users] Repair inconsistent pgs..
Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just : > Is there a bug for this in the tracker? > -Sam > > On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor > wrote: > > Issue, that in forward mode, fstrim doesn't work proper, and when we take > > snapshot - data not proper update in cache layer, and client (ceph) see > > damaged snap.. As headers requested from cache layer. > > > > 2015-08-20 19:53 GMT+03:00 Samuel Just : > >> > >> What was the issue? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor > >> wrote: > >> > Samuel, we turned off cache layer few hours ago... > >> > I will post ceph.log in few minutes > >> > > >> > For snap - we found issue, was connected with cache tier.. > >> > > >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : > >> >> > >> >> Ok, you appear to be using a replicated cache tier in front of a > >> >> replicated base tier. Please scrub both inconsistent pgs and post > the > >> >> ceph.log from before when you started the scrub until after. Also, > >> >> what command are you using to take snapshots? > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > >> >> wrote: > >> >> > Hi Samuel, we try to fix it in trick way. > >> >> > > >> >> > we check all rbd_data chunks from logs (OSD) which are affected, > then > >> >> > query > >> >> > rbd info to compare which rbd consist bad rbd_data, after that we > >> >> > mount > >> >> > this > >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to > new > >> >> > one. > >> >> > > >> >> > But after that - scrub errors growing... Was 15 errors.. .Now 35... > >> >> > We > >> >> > laos > >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs > still > >> >> > have > >> >> > 35 scrub errors... > >> >> > > >> >> > ceph osd getmap -o - attached > >> >> > > >> >> > > >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : > >> >> >> > >> >> >> Is the number of inconsistent objects growing? Can you attach the > >> >> >> whole ceph.log from the 6 hours before and after the snippet you > >> >> >> linked above? Are you using cache/tiering? Can you attach the > >> >> >> osdmap > >> >> >> (ceph osd getmap -o )? > >> >> >> -Sam > >> >> >> > >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor > >> >> >> wrote: > >> >> >> > ceph - 0.94.2 > >> >> >> > Its happen during rebalancing > >> >> >> > > >> >> >> > I thought too, that some OSD miss copy, but looks like all > miss... > >> >> >> > So any advice in which direction i need to go > >> >> >> > > >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : > >> >> >> >> > >> >> >> >> From a quick peek it looks like some of the OSDs are missing > >> >> >> >> clones > >> >> >> >> of > >> >> >> >> objects. I'm not sure how that could happen and I'd expect the > pg > >> >> >> >> repair to handle that but if it's not there's probably > something > >> >> >> >> wrong; what version of Ceph are you running? Sam, is this > >> >> >> >> something > >> >> >> >> you've seen, a new bug, or some kind of config issue? > >> >> >> >> -Greg > >> >> >> >> > >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor > >> >> >> >> wrote: > >> >> >> >> > Hi all, at our production cluster, due high rebalancing ((( > we > >> >> >> >> > have 2 > >> >> >> >> > pgs in > >> >> >> >> > inconsistent state... > >> >> >> >> > > >> >> >> >> > root@temp:~# ceph health detail | grep inc > >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> >> >> >> > > >> >> >> >> > From OSD logs, after recovery attempt: > >> >> >> >> > > >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | > while > >> >> >> >> > read > >> >> >> >> > i; > >> >> >> >> > do > >> >> >> >> > ceph pg repair ${i} ; done > >> >> >> >> > dumped all in format plain > >> >> >> >> > instructing pg 2.490 on osd.56 to repair > >> >> >> >> > instructing pg 2.c4 on osd.56 to repair > >> >> >> >> > > >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 > >> >> >> >> > 7f94663b3700 > >> >> >> >> > -1 > >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 > >> >> >> >> > expected > >> >> >> >> > clone > >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 > >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 > >> >> >> >> > 7f94663b3700 > >> >> >> >> > -1 > >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 > >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 > >> >> >> >> > expected > >> >> >> >> > clone > >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2 > >> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 > >> >> >>
Re: [ceph-users] Repair inconsistent pgs..
Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor wrote: > Not yet. I will create. > But according to mail lists and Inktank docs - it's expected behaviour when > cache enable > > 2015-08-20 19:56 GMT+03:00 Samuel Just : >> >> Is there a bug for this in the tracker? >> -Sam >> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor >> wrote: >> > Issue, that in forward mode, fstrim doesn't work proper, and when we >> > take >> > snapshot - data not proper update in cache layer, and client (ceph) see >> > damaged snap.. As headers requested from cache layer. >> > >> > 2015-08-20 19:53 GMT+03:00 Samuel Just : >> >> >> >> What was the issue? >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor >> >> wrote: >> >> > Samuel, we turned off cache layer few hours ago... >> >> > I will post ceph.log in few minutes >> >> > >> >> > For snap - we found issue, was connected with cache tier.. >> >> > >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : >> >> >> >> >> >> Ok, you appear to be using a replicated cache tier in front of a >> >> >> replicated base tier. Please scrub both inconsistent pgs and post >> >> >> the >> >> >> ceph.log from before when you started the scrub until after. Also, >> >> >> what command are you using to take snapshots? >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor >> >> >> wrote: >> >> >> > Hi Samuel, we try to fix it in trick way. >> >> >> > >> >> >> > we check all rbd_data chunks from logs (OSD) which are affected, >> >> >> > then >> >> >> > query >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that we >> >> >> > mount >> >> >> > this >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to >> >> >> > new >> >> >> > one. >> >> >> > >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now >> >> >> > 35... >> >> >> > We >> >> >> > laos >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs >> >> >> > still >> >> >> > have >> >> >> > 35 scrub errors... >> >> >> > >> >> >> > ceph osd getmap -o - attached >> >> >> > >> >> >> > >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Is the number of inconsistent objects growing? Can you attach >> >> >> >> the >> >> >> >> whole ceph.log from the 6 hours before and after the snippet you >> >> >> >> linked above? Are you using cache/tiering? Can you attach the >> >> >> >> osdmap >> >> >> >> (ceph osd getmap -o )? >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >> >> >> >> wrote: >> >> >> >> > ceph - 0.94.2 >> >> >> >> > Its happen during rebalancing >> >> >> >> > >> >> >> >> > I thought too, that some OSD miss copy, but looks like all >> >> >> >> > miss... >> >> >> >> > So any advice in which direction i need to go >> >> >> >> > >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum : >> >> >> >> >> >> >> >> >> >> From a quick peek it looks like some of the OSDs are missing >> >> >> >> >> clones >> >> >> >> >> of >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect the >> >> >> >> >> pg >> >> >> >> >> repair to handle that but if it's not there's probably >> >> >> >> >> something >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this >> >> >> >> >> something >> >> >> >> >> you've seen, a new bug, or some kind of config issue? >> >> >> >> >> -Greg >> >> >> >> >> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >> >> >> >> >> wrote: >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing ((( >> >> >> >> >> > we >> >> >> >> >> > have 2 >> >> >> >> >> > pgs in >> >> >> >> >> > inconsistent state... >> >> >> >> >> > >> >> >> >> >> > root@temp:~# ceph health detail | grep inc >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> >> >> >> >> > >> >> >> >> >> > From OSD logs, after recovery attempt: >> >> >> >> >> > >> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | >> >> >> >> >> > while >> >> >> >> >> > read >> >> >> >> >> > i; >> >> >> >> >> > do >> >> >> >> >> > ceph pg repair ${i} ; done >> >> >> >> >> > dumped all in format plain >> >> >> >> >> > instructing pg 2.490 on osd.56 to repair >> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair >> >> >> >> >> > >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 >> >> >> >> >> > 7f94663b3700 >> >> >> >> >> > -1 >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >> >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 >> >> >> >> >> > expected >> >> >> >> >> > clone >> >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2 >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 >> >> >> >> >> > 7f94663b3700 >> >> >> >> >> > -1 >> >> >> >> >
Re: [ceph-users] Repair inconsistent pgs..
Inktank: https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf Mail-list: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html 2015-08-20 20:06 GMT+03:00 Samuel Just : > Which docs? > -Sam > > On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor > wrote: > > Not yet. I will create. > > But according to mail lists and Inktank docs - it's expected behaviour > when > > cache enable > > > > 2015-08-20 19:56 GMT+03:00 Samuel Just : > >> > >> Is there a bug for this in the tracker? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor > >> wrote: > >> > Issue, that in forward mode, fstrim doesn't work proper, and when we > >> > take > >> > snapshot - data not proper update in cache layer, and client (ceph) > see > >> > damaged snap.. As headers requested from cache layer. > >> > > >> > 2015-08-20 19:53 GMT+03:00 Samuel Just : > >> >> > >> >> What was the issue? > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor > >> >> wrote: > >> >> > Samuel, we turned off cache layer few hours ago... > >> >> > I will post ceph.log in few minutes > >> >> > > >> >> > For snap - we found issue, was connected with cache tier.. > >> >> > > >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : > >> >> >> > >> >> >> Ok, you appear to be using a replicated cache tier in front of a > >> >> >> replicated base tier. Please scrub both inconsistent pgs and post > >> >> >> the > >> >> >> ceph.log from before when you started the scrub until after. > Also, > >> >> >> what command are you using to take snapshots? > >> >> >> -Sam > >> >> >> > >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > >> >> >> wrote: > >> >> >> > Hi Samuel, we try to fix it in trick way. > >> >> >> > > >> >> >> > we check all rbd_data chunks from logs (OSD) which are affected, > >> >> >> > then > >> >> >> > query > >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that > we > >> >> >> > mount > >> >> >> > this > >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume > to > >> >> >> > new > >> >> >> > one. > >> >> >> > > >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now > >> >> >> > 35... > >> >> >> > We > >> >> >> > laos > >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs > >> >> >> > still > >> >> >> > have > >> >> >> > 35 scrub errors... > >> >> >> > > >> >> >> > ceph osd getmap -o - attached > >> >> >> > > >> >> >> > > >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : > >> >> >> >> > >> >> >> >> Is the number of inconsistent objects growing? Can you attach > >> >> >> >> the > >> >> >> >> whole ceph.log from the 6 hours before and after the snippet > you > >> >> >> >> linked above? Are you using cache/tiering? Can you attach the > >> >> >> >> osdmap > >> >> >> >> (ceph osd getmap -o )? > >> >> >> >> -Sam > >> >> >> >> > >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor > >> >> >> >> wrote: > >> >> >> >> > ceph - 0.94.2 > >> >> >> >> > Its happen during rebalancing > >> >> >> >> > > >> >> >> >> > I thought too, that some OSD miss copy, but looks like all > >> >> >> >> > miss... > >> >> >> >> > So any advice in which direction i need to go > >> >> >> >> > > >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum < > gfar...@redhat.com>: > >> >> >> >> >> > >> >> >> >> >> From a quick peek it looks like some of the OSDs are missing > >> >> >> >> >> clones > >> >> >> >> >> of > >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect > the > >> >> >> >> >> pg > >> >> >> >> >> repair to handle that but if it's not there's probably > >> >> >> >> >> something > >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this > >> >> >> >> >> something > >> >> >> >> >> you've seen, a new bug, or some kind of config issue? > >> >> >> >> >> -Greg > >> >> >> >> >> > >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor > >> >> >> >> >> wrote: > >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing > ((( > >> >> >> >> >> > we > >> >> >> >> >> > have 2 > >> >> >> >> >> > pgs in > >> >> >> >> >> > inconsistent state... > >> >> >> >> >> > > >> >> >> >> >> > root@temp:~# ceph health detail | grep inc > >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> >> >> >> >> > > >> >> >> >> >> > From OSD logs, after recovery attempt: > >> >> >> >> >> > > >> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | > >> >> >> >> >> > while > >> >> >> >> >> > read > >> >> >> >> >> > i; > >> >> >> >> >> > do > >> >> >> >> >> > ceph pg repair ${i} ; done > >> >> >> >> >> > dumped all in format plain > >> >> >> >> >> > instructing pg 2.490 on osd.56 to repair > >> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair > >> >> >> >> >> > > >> >> >> >> >>
Re: [ceph-users] Repair inconsistent pgs..
Guys, I'm Igor's colleague, working a bit on CEPH, together with Igor. This is production cluster, and we are becoming more desperate as the time goes by. Im not sure if this is appropriate place to seek commercial support, but anyhow, I do it... If anyone feels like and have some experience in this particular PG troubleshooting issues, we are also ready to seek for commercial support to solve our issue, company or individual, it doesn't matter. Thanks, Andrija On 20 August 2015 at 19:07, Voloshanenko Igor wrote: > Inktank: > > https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf > > Mail-list: > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html > > 2015-08-20 20:06 GMT+03:00 Samuel Just : > >> Which docs? >> -Sam >> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor >> wrote: >> > Not yet. I will create. >> > But according to mail lists and Inktank docs - it's expected behaviour >> when >> > cache enable >> > >> > 2015-08-20 19:56 GMT+03:00 Samuel Just : >> >> >> >> Is there a bug for this in the tracker? >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor >> >> wrote: >> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we >> >> > take >> >> > snapshot - data not proper update in cache layer, and client (ceph) >> see >> >> > damaged snap.. As headers requested from cache layer. >> >> > >> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just : >> >> >> >> >> >> What was the issue? >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor >> >> >> wrote: >> >> >> > Samuel, we turned off cache layer few hours ago... >> >> >> > I will post ceph.log in few minutes >> >> >> > >> >> >> > For snap - we found issue, was connected with cache tier.. >> >> >> > >> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Ok, you appear to be using a replicated cache tier in front of a >> >> >> >> replicated base tier. Please scrub both inconsistent pgs and >> post >> >> >> >> the >> >> >> >> ceph.log from before when you started the scrub until after. >> Also, >> >> >> >> what command are you using to take snapshots? >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor >> >> >> >> wrote: >> >> >> >> > Hi Samuel, we try to fix it in trick way. >> >> >> >> > >> >> >> >> > we check all rbd_data chunks from logs (OSD) which are >> affected, >> >> >> >> > then >> >> >> >> > query >> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that >> we >> >> >> >> > mount >> >> >> >> > this >> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume >> to >> >> >> >> > new >> >> >> >> > one. >> >> >> >> > >> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now >> >> >> >> > 35... >> >> >> >> > We >> >> >> >> > laos >> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs >> >> >> >> > still >> >> >> >> > have >> >> >> >> > 35 scrub errors... >> >> >> >> > >> >> >> >> > ceph osd getmap -o - attached >> >> >> >> > >> >> >> >> > >> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> >> >> Is the number of inconsistent objects growing? Can you attach >> >> >> >> >> the >> >> >> >> >> whole ceph.log from the 6 hours before and after the snippet >> you >> >> >> >> >> linked above? Are you using cache/tiering? Can you attach >> the >> >> >> >> >> osdmap >> >> >> >> >> (ceph osd getmap -o )? >> >> >> >> >> -Sam >> >> >> >> >> >> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >> >> >> >> >> wrote: >> >> >> >> >> > ceph - 0.94.2 >> >> >> >> >> > Its happen during rebalancing >> >> >> >> >> > >> >> >> >> >> > I thought too, that some OSD miss copy, but looks like all >> >> >> >> >> > miss... >> >> >> >> >> > So any advice in which direction i need to go >> >> >> >> >> > >> >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum < >> gfar...@redhat.com>: >> >> >> >> >> >> >> >> >> >> >> >> From a quick peek it looks like some of the OSDs are >> missing >> >> >> >> >> >> clones >> >> >> >> >> >> of >> >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect >> the >> >> >> >> >> >> pg >> >> >> >> >> >> repair to handle that but if it's not there's probably >> >> >> >> >> >> something >> >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this >> >> >> >> >> >> something >> >> >> >> >> >> you've seen, a new bug, or some kind of config issue? >> >> >> >> >> >> -Greg >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >> >> >> >> >> >> wrote: >> >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing >> ((( >> >> >> >> >> >> > we >> >> >> >> >> >> > have 2 >> >> >> >> >> >> > pgs in >> >> >> >> >> >> > inconsistent state... >> >> >> >> >> >> > >> >> >> >> >> >> > root@temp:~# ceph health detail | grep inc >> >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18
Re: [ceph-users] requests are blocked - problem
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Jacek Jarosiewicz > Sent: 20 August 2015 07:31 > To: Nick Fisk ; ceph-us...@ceph.com > Subject: Re: [ceph-users] requests are blocked - problem > > On 08/19/2015 03:41 PM, Nick Fisk wrote: > > Although you may get some benefit from tweaking parameters, I suspect > you are nearer the performance ceiling for the current implementation of > the tiering code. Could you post all the variables you set for the tiering > including target_max_bytes and the dirty/full ratios. > > > > sure, all the parameters set are like this: > > hit_set_type bloom > hit_set_count 1 > hit_set_period 3600 > target_max_bytes 65498264640 > target_max_objects 100 > cache_target_full_ratio 0.95 > cache_min_flush_age 600 > cache_min_evict_age 1800 > cache_target_dirty_ratio 0.75 That pretty much looks ok to me, the only thing I can suggest is maybe to lower the full_ratio a bit. The full ratio is based on the percentage across the whole pool, but the actual eviction occurs at a percentage of a PG level. I think this may mean that in certain cases a PG may block whilist is evicts even though it appears the pool hasn't reached the full target. > > > > Since you are doing maildirs, which will have lots of small files, you might > also want to try making the object size of the RBD smaller. This will mean > less > data is needed to be shifted on each promotion/flush. > > > > I'll try that - thanks! > > J > > -- > Jacek Jarosiewicz > Administrator Systemów Informatycznych > > > SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 > Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy > Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy > 42.756.000 zł > NIP: 957-05-49-503 > Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa > > > SUPERMEDIA -> http://www.supermedia.pl > dostep do internetu - hosting - kolokacja - lacza - telefonia > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Email lgx...@nxtzas.com trying to subscribe to tracker.ceph.com
Someone using the email address lgx...@nxtzas.com is trying to subscribe to the Ceph Redmine tracker, but neither redmine nor I can use that email address; it bounces with : Host or domain name not found. Name service error for name=nxtzas.com type=: Host not found If this is you, please email me privately and we'll get you fixed up. -- Dan Mick Red Hat, Inc. Ceph docs: http://ceph.com/docs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Ah, this is kind of silly. I think you don't have 37 errors, but 2 errors. pg 2.490 object 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing snap 141. If you look at the objects after that in the log: 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster [ERR] repair 2.490 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster [ERR] repair 2.490 ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 The clone from the second line matches the head object from the previous line, and they have the same clone id. I *think* that the first error is real, and the subsequent ones are just scrub being dumb. Same deal with pg 2.c4. I just opened http://tracker.ceph.com/issues/12738. The original problem is that 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both missing a clone. Not sure how that happened, my money is on a cache/tiering evict racing with a snap trim. If you have any logging or relevant information from when that happened, you should open a bug. The 'snapdir' in the two object names indicates that the head object has actually been deleted (which makes sense if you moved the image to a new image and deleted the old one) and is only being kept around since there are live snapshots. I suggest you leave the snapshots for those images alone for the time being -- removing them might cause the osd to crash trying to clean up the wierd on disk state. Other than the leaked space from those two image snapshots and the annoying spurious scrub errors, I think no actual corruption is going on though. I created a tracker ticket for a feature that would let ceph-objectstore-tool remove the spurious clone from the head/snapdir metadata. Am I right that you haven't actually seen any osd crashes or user visible corruption (except possibly on snapshots of those two images)? -Sam On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor wrote: > Inktank: > https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf > > Mail-list: > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html > > 2015-08-20 20:06 GMT+03:00 Samuel Just : >> >> Which docs? >> -Sam >> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor >> wrote: >> > Not yet. I will create. >> > But according to mail lists and Inktank docs - it's expected behaviour >> > when >> > cache enable >> > >> > 2015-08-20 19:56 GMT+03:00 Samuel Just : >> >> >> >> Is there a bug for this in the tracker? >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor >> >> wrote: >> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we >> >> > take >> >> > snapshot - data not proper update in cache layer, and client (ceph) >> >> > see >> >> > damaged snap.. As headers requested from cache layer. >> >> > >> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just : >> >> >> >> >> >> What was the issue? >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor >> >> >> wrote: >> >> >> > Samuel, we turned off cache layer few hours ago... >> >> >> > I will post ceph.log in few minutes >> >> >> > >> >> >> > For snap - we found issue, was connected with cache tier.. >> >> >> > >> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Ok, you appear to be using a replicated cache tier in front of a >> >> >> >> replicated base tier. Please scrub both inconsistent pgs and >> >> >> >> post >> >> >> >> the >> >> >> >> ceph.log from before when you started the scrub until after. >> >> >> >> Also, >> >> >> >> what command are you using to take snapshots? >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor >> >> >> >> wrote: >> >> >> >> > Hi Samuel, we try to fix it in trick way. >> >> >> >> > >> >> >> >> > we check all rbd_data chunks from logs (OSD) which are >> >> >> >> > affected, >> >> >> >> > then >> >> >> >> > query >> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that >> >> >> >> > we >> >> >> >> > mount >> >> >> >> > this >> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume >> >> >> >> > to >> >> >> >> > new >> >> >> >> > one. >> >> >> >> > >> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now >> >> >> >> > 35... >> >> >> >> > We >> >> >> >> > laos >> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs >> >> >> >> > still >> >> >> >> > have >> >> >> >> > 35 scrub errors... >> >> >> >> > >> >> >> >> > ceph osd getmap -o - attached >> >> >> >> > >> >> >> >> > >> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> >> >> Is the number of inconsistent objects growing? Can y
Re: [ceph-users] Repair inconsistent pgs..
The feature bug for the tool is http://tracker.ceph.com/issues/12740. -Sam On Thu, Aug 20, 2015 at 2:52 PM, Samuel Just wrote: > Ah, this is kind of silly. I think you don't have 37 errors, but 2 > errors. pg 2.490 object > 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing > snap 141. If you look at the objects after that in the log: > > 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster > [ERR] repair 2.490 > 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected > clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 > 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster > [ERR] repair 2.490 > ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected > clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 > > The clone from the second line matches the head object from the > previous line, and they have the same clone id. I *think* that the > first error is real, and the subsequent ones are just scrub being > dumb. Same deal with pg 2.c4. I just opened > http://tracker.ceph.com/issues/12738. > > The original problem is that > 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and > 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both > missing a clone. Not sure how that happened, my money is on a > cache/tiering evict racing with a snap trim. If you have any logging > or relevant information from when that happened, you should open a > bug. The 'snapdir' in the two object names indicates that the head > object has actually been deleted (which makes sense if you moved the > image to a new image and deleted the old one) and is only being kept > around since there are live snapshots. I suggest you leave the > snapshots for those images alone for the time being -- removing them > might cause the osd to crash trying to clean up the wierd on disk > state. Other than the leaked space from those two image snapshots and > the annoying spurious scrub errors, I think no actual corruption is > going on though. I created a tracker ticket for a feature that would > let ceph-objectstore-tool remove the spurious clone from the > head/snapdir metadata. > > Am I right that you haven't actually seen any osd crashes or user > visible corruption (except possibly on snapshots of those two images)? > -Sam > > On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor > wrote: >> Inktank: >> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf >> >> Mail-list: >> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html >> >> 2015-08-20 20:06 GMT+03:00 Samuel Just : >>> >>> Which docs? >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor >>> wrote: >>> > Not yet. I will create. >>> > But according to mail lists and Inktank docs - it's expected behaviour >>> > when >>> > cache enable >>> > >>> > 2015-08-20 19:56 GMT+03:00 Samuel Just : >>> >> >>> >> Is there a bug for this in the tracker? >>> >> -Sam >>> >> >>> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor >>> >> wrote: >>> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we >>> >> > take >>> >> > snapshot - data not proper update in cache layer, and client (ceph) >>> >> > see >>> >> > damaged snap.. As headers requested from cache layer. >>> >> > >>> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just : >>> >> >> >>> >> >> What was the issue? >>> >> >> -Sam >>> >> >> >>> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor >>> >> >> wrote: >>> >> >> > Samuel, we turned off cache layer few hours ago... >>> >> >> > I will post ceph.log in few minutes >>> >> >> > >>> >> >> > For snap - we found issue, was connected with cache tier.. >>> >> >> > >>> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : >>> >> >> >> >>> >> >> >> Ok, you appear to be using a replicated cache tier in front of a >>> >> >> >> replicated base tier. Please scrub both inconsistent pgs and >>> >> >> >> post >>> >> >> >> the >>> >> >> >> ceph.log from before when you started the scrub until after. >>> >> >> >> Also, >>> >> >> >> what command are you using to take snapshots? >>> >> >> >> -Sam >>> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor >>> >> >> >> wrote: >>> >> >> >> > Hi Samuel, we try to fix it in trick way. >>> >> >> >> > >>> >> >> >> > we check all rbd_data chunks from logs (OSD) which are >>> >> >> >> > affected, >>> >> >> >> > then >>> >> >> >> > query >>> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that >>> >> >> >> > we >>> >> >> >> > mount >>> >> >> >> > this >>> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume >>> >> >> >> > to >>> >> >> >> > new >>> >> >> >> > one. >>> >> >> >> > >>> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now >>> >> >> >> > 35... >>> >> >> >> > We >>> >> >> >> > laos >>> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs >>> >> >>
Re: [ceph-users] Repair inconsistent pgs..
thank you Sam! I also noticed this linked errors during scrub... Now all lools like reasonable! So we will wait for bug to be closed. do you need any help on it? I mean i can help with coding/testing/etc... 2015-08-21 0:52 GMT+03:00 Samuel Just : > Ah, this is kind of silly. I think you don't have 37 errors, but 2 > errors. pg 2.490 object > 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing > snap 141. If you look at the objects after that in the log: > > 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster > [ERR] repair 2.490 > 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected > clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 > 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster > [ERR] repair 2.490 > ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected > clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 > > The clone from the second line matches the head object from the > previous line, and they have the same clone id. I *think* that the > first error is real, and the subsequent ones are just scrub being > dumb. Same deal with pg 2.c4. I just opened > http://tracker.ceph.com/issues/12738. > > The original problem is that > 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and > 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both > missing a clone. Not sure how that happened, my money is on a > cache/tiering evict racing with a snap trim. If you have any logging > or relevant information from when that happened, you should open a > bug. The 'snapdir' in the two object names indicates that the head > object has actually been deleted (which makes sense if you moved the > image to a new image and deleted the old one) and is only being kept > around since there are live snapshots. I suggest you leave the > snapshots for those images alone for the time being -- removing them > might cause the osd to crash trying to clean up the wierd on disk > state. Other than the leaked space from those two image snapshots and > the annoying spurious scrub errors, I think no actual corruption is > going on though. I created a tracker ticket for a feature that would > let ceph-objectstore-tool remove the spurious clone from the > head/snapdir metadata. > > Am I right that you haven't actually seen any osd crashes or user > visible corruption (except possibly on snapshots of those two images)? > -Sam > > On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor > wrote: > > Inktank: > > > https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf > > > > Mail-list: > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html > > > > 2015-08-20 20:06 GMT+03:00 Samuel Just : > >> > >> Which docs? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor > >> wrote: > >> > Not yet. I will create. > >> > But according to mail lists and Inktank docs - it's expected behaviour > >> > when > >> > cache enable > >> > > >> > 2015-08-20 19:56 GMT+03:00 Samuel Just : > >> >> > >> >> Is there a bug for this in the tracker? > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor > >> >> wrote: > >> >> > Issue, that in forward mode, fstrim doesn't work proper, and when > we > >> >> > take > >> >> > snapshot - data not proper update in cache layer, and client (ceph) > >> >> > see > >> >> > damaged snap.. As headers requested from cache layer. > >> >> > > >> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just : > >> >> >> > >> >> >> What was the issue? > >> >> >> -Sam > >> >> >> > >> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor > >> >> >> wrote: > >> >> >> > Samuel, we turned off cache layer few hours ago... > >> >> >> > I will post ceph.log in few minutes > >> >> >> > > >> >> >> > For snap - we found issue, was connected with cache tier.. > >> >> >> > > >> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just : > >> >> >> >> > >> >> >> >> Ok, you appear to be using a replicated cache tier in front of > a > >> >> >> >> replicated base tier. Please scrub both inconsistent pgs and > >> >> >> >> post > >> >> >> >> the > >> >> >> >> ceph.log from before when you started the scrub until after. > >> >> >> >> Also, > >> >> >> >> what command are you using to take snapshots? > >> >> >> >> -Sam > >> >> >> >> > >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > >> >> >> >> wrote: > >> >> >> >> > Hi Samuel, we try to fix it in trick way. > >> >> >> >> > > >> >> >> >> > we check all rbd_data chunks from logs (OSD) which are > >> >> >> >> > affected, > >> >> >> >> > then > >> >> >> >> > query > >> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after > that > >> >> >> >> > we > >> >> >> >> > mount > >> >> >> >> > this > >> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad > volume > >> >> >> >> > to > >> >> >> >> > new > >> >> >> >> > one. > >> >> >> >> > > >> >> >> >> > But
Re: [ceph-users] Repair inconsistent pgs..
Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but other images (that's why the scrub errors went down briefly, those objects -- which were fine -- went away). You might want to export and reimport those two images into new images, but leave the old ones alone until you can clean up the on disk state (image and snapshots) and clear the scrub errors. You probably don't want to read the snapshots for those images either. Everything else is, I think, harmless. The ceph-objectstore-tool feature would probably not be too hard, actually. Each head/snapdir image has two attrs (possibly stored in leveldb -- that's why you want to modify the ceph-objectstore-tool and use its interfaces rather than mucking about with the files directly) '_' and 'snapset' which contain encoded representations of object_info_t and SnapSet (both can be found in src/osd/osd_types.h). SnapSet has a set of clones and related metadata -- you want to read the SnapSet attr off disk and commit a transaction writing out a new version with that clone removed. I'd start by cloning the repo, starting a vstart cluster locally, and reproducing the issue. Next, get familiar with using ceph-objectstore-tool on the osds in that vstart cluster. A good first change would be creating a ceph-objectstore-tool op that lets you dump json for the object_info_t and SnapSet (both types have format() methods which make that easy) on an object to stdout so you can confirm what's actually there. oftc #ceph-devel or the ceph-devel mailing list would be the right place to ask questions. Otherwise, it'll probably get done in the next few weeks. -Sam On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor wrote: > thank you Sam! > I also noticed this linked errors during scrub... > > Now all lools like reasonable! > > So we will wait for bug to be closed. > > do you need any help on it? > > I mean i can help with coding/testing/etc... > > 2015-08-21 0:52 GMT+03:00 Samuel Just : >> >> Ah, this is kind of silly. I think you don't have 37 errors, but 2 >> errors. pg 2.490 object >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing >> snap 141. If you look at the objects after that in the log: >> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster >> [ERR] repair 2.490 >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster >> [ERR] repair 2.490 >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 >> >> The clone from the second line matches the head object from the >> previous line, and they have the same clone id. I *think* that the >> first error is real, and the subsequent ones are just scrub being >> dumb. Same deal with pg 2.c4. I just opened >> http://tracker.ceph.com/issues/12738. >> >> The original problem is that >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both >> missing a clone. Not sure how that happened, my money is on a >> cache/tiering evict racing with a snap trim. If you have any logging >> or relevant information from when that happened, you should open a >> bug. The 'snapdir' in the two object names indicates that the head >> object has actually been deleted (which makes sense if you moved the >> image to a new image and deleted the old one) and is only being kept >> around since there are live snapshots. I suggest you leave the >> snapshots for those images alone for the time being -- removing them >> might cause the osd to crash trying to clean up the wierd on disk >> state. Other than the leaked space from those two image snapshots and >> the annoying spurious scrub errors, I think no actual corruption is >> going on though. I created a tracker ticket for a feature that would >> let ceph-objectstore-tool remove the spurious clone from the >> head/snapdir metadata. >> >> Am I right that you haven't actually seen any osd crashes or user >> visible corruption (except possibly on snapshots of those two images)? >> -Sam >> >> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor >> wrote: >> > Inktank: >> > >> > https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf >> > >> > Mail-list: >> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html >> > >> > 2015-08-20 20:06 GMT+03:00 Samuel Just : >> >> >> >> Which docs? >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor >> >> wrote: >> >> > Not yet. I will create. >> >> > But according to mail lists and Inktank docs - it's expected >> >> > behaviour >> >> > when >> >> > cache enable >> >> > >> >> >
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor wrote: > Hi all, can you please help me with unexplained situation... > > All snapshot inside ceph broken... > > So, as example, we have VM template, as rbd inside ceph. > We can map it and mount to check that all ok with it > > root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > /dev/rbd0 > root@test:~# parted /dev/rbd0 print > Model: Unknown (unknown) > Disk /dev/rbd0: 10.7GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > > Number Start End SizeType File system Flags > 1 1049kB 525MB 524MB primary ext4 boot > 2 525MB 10.7GB 10.2GB primary lvm > > Than i want to create snap, so i do: > root@test:~# rbd snap create > cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > > And now i want to map it: > > root@test:~# rbd map > cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > /dev/rbd1 > root@test:~# parted /dev/rbd1 print > Warning: Unable to open /dev/rbd1 read-write (Read-only file system). > /dev/rbd1 has been opened read-only. > Warning: Unable to open /dev/rbd1 read-write (Read-only file system). > /dev/rbd1 has been opened read-only. > Error: /dev/rbd1: unrecognised disk label > > Even md5 different... > root@ix-s2:~# md5sum /dev/rbd0 > 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > root@ix-s2:~# md5sum /dev/rbd1 > e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > > > Ok, now i protect snap and create clone... but same thing... > md5 for clone same as for snap,, > > root@test:~# rbd unmap /dev/rbd1 > root@test:~# rbd snap protect > cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > root@test:~# rbd clone > cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > cold-storage/test-image > root@test:~# rbd map cold-storage/test-image > /dev/rbd1 > root@test:~# md5sum /dev/rbd1 > e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > > but it's broken... > root@test:~# parted /dev/rbd1 print > Error: /dev/rbd1: unrecognised disk label > > > = > > tech details: > > root@test:~# ceph -v > ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > > We have 2 inconstistent pgs, but all images not placed on this pgs... > > root@test:~# ceph health detail > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > pg 2.490 is active+clean+inconsistent, acting [56,15,29] > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > 18 scrub errors > > > > root@test:~# ceph osd map cold-storage > 0e23c701-401d-4465-b9b4-c02939d57bb5 > osdmap e16770 pool 'cold-storage' (2) object > '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up > ([37,15,14], p37) acting ([37,15,14], p37) > root@test:~# ceph osd map cold-storage > 0e23c701-401d-4465-b9b4-c02939d57bb5@snap > osdmap e16770 pool 'cold-storage' (2) object > '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) -> > up ([12,23,17], p12) acting ([12,23,17], p12) > root@test:~# ceph osd map cold-storage > 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image > osdmap e16770 pool 'cold-storage' (2) object > '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 > (2.2a9) -> up ([12,44,23], p12) acting ([12,44,23], p12) > > > Also we use cache layer, which in current moment - in forward mode... > > Can you please help me with this.. As my brain stop to understand what is > going on... > > Thank in advance! > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd images block names started with this... Actually, now that I think about it, you probably didn't remove the > images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 > and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 2015-08-21 1:36 GMT+03:00 Samuel Just : > Actually, now that I think about it, you probably didn't remove the > images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 > and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but > other images (that's why the scrub errors went down briefly, those > objects -- which were fine -- went away). You might want to export > and reimport those two images into new images, but leave the old ones > alone until you can clean up the on disk state (image and snapshots) > and clear the scrub errors. You probably don't want to read the > snapshots for those images either. Everything else is, I think, > harmless. > > The ceph-objectstore-tool feature would probably not be too hard, > actually. Each head/snapdir image has two attrs (possibly stored in > leveldb -- that's why you want to modify the ceph-objectstore-tool and > use its interfaces rather than mucking about with the files directly) > '_' and 'snapset' which contain encoded representations of > object_info_t and SnapSet (both can be found in src/osd/osd_types.h). > SnapSet has a set of clones and related metadata -- you want to read > the SnapSet attr off disk and commit a transaction writing out a new > version with that clone removed. I'd start by cloning the repo, > starting a vstart cluster locally, and reproducing the issue. Next, > get familiar with using ceph-objectstore-tool on the osds in that > vstart cluster. A good first change would be creating a > ceph-objectstore-tool op that lets you dump json for the object_info_t > and SnapSet (both types have format() methods which make that easy) on > an object to stdout so you can confirm what's actually there. oftc > #ceph-devel or the ceph-devel mailing list would be the right place to > ask questions. > > Otherwise, it'll probably get done in the next few weeks. > -Sam > > On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor > wrote: > > thank you Sam! > > I also noticed this linked errors during scrub... > > > > Now all lools like reasonable! > > > > So we will wait for bug to be closed. > > > > do you need any help on it? > > > > I mean i can help with coding/testing/etc... > > > > 2015-08-21 0:52 GMT+03:00 Samuel Just : > >> > >> Ah, this is kind of silly. I think you don't have 37 errors, but 2 > >> errors. pg 2.490 object > >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing > >> snap 141. If you look at the objects after that in the log: > >> > >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster > >> [ERR] repair 2.490 > >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected > >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 > >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster > >> [ERR] repair 2.490 > >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected > >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 > >> > >> The clone from the second line matches the head object from the > >> previous line, and they have the same clone id. I *think* that the > >> first error is real, and the subsequent ones are just scrub being > >> dumb. Same deal with pg 2.c4. I just opened > >> http://tracker.ceph.com/issues/12738. > >> > >> The original problem is that > >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and > >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both > >> missing a clone. Not sure how that happened, my money is on a > >> cache/tiering evict racing with a snap trim. If you have any logging > >> or relevant information from when that happened, you should open a > >> bug. The 'snapdir' in the two object names indicates that the head > >> object has actually been deleted (which makes sense if you moved the > >> image to a new image and deleted the old one) and is only being kept > >> around since there are live snapshots. I suggest you leave the > >> snapshots for those images alone for the time being -- removing them > >> might cause the osd to crash trying to clean up the wierd on disk > >> state. Other than the leaked space from those two image snapshots and > >> the annoying spurious scrub errors, I think no actual corruption is > >> going on though. I created a tracker ticket for a feature that would > >> let ceph-objectstore-tool remove the spurious clone from the > >> head/snapdir metadata. > >> > >> Am I right that you haven't actually seen any osd crashes or user > >> visible corruption (except possibly on snapshots of those two images)? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor > >>
Re: [ceph-users] Repair inconsistent pgs..
Interesting. How often do you delete an image? I'm wondering if whatever this is happened when you deleted these two images. -Sam On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor wrote: > Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd > images block names started with this... > >> Actually, now that I think about it, you probably didn't remove the >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 > > > > > 2015-08-21 1:36 GMT+03:00 Samuel Just : >> >> Actually, now that I think about it, you probably didn't remove the >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but >> other images (that's why the scrub errors went down briefly, those >> objects -- which were fine -- went away). You might want to export >> and reimport those two images into new images, but leave the old ones >> alone until you can clean up the on disk state (image and snapshots) >> and clear the scrub errors. You probably don't want to read the >> snapshots for those images either. Everything else is, I think, >> harmless. >> >> The ceph-objectstore-tool feature would probably not be too hard, >> actually. Each head/snapdir image has two attrs (possibly stored in >> leveldb -- that's why you want to modify the ceph-objectstore-tool and >> use its interfaces rather than mucking about with the files directly) >> '_' and 'snapset' which contain encoded representations of >> object_info_t and SnapSet (both can be found in src/osd/osd_types.h). >> SnapSet has a set of clones and related metadata -- you want to read >> the SnapSet attr off disk and commit a transaction writing out a new >> version with that clone removed. I'd start by cloning the repo, >> starting a vstart cluster locally, and reproducing the issue. Next, >> get familiar with using ceph-objectstore-tool on the osds in that >> vstart cluster. A good first change would be creating a >> ceph-objectstore-tool op that lets you dump json for the object_info_t >> and SnapSet (both types have format() methods which make that easy) on >> an object to stdout so you can confirm what's actually there. oftc >> #ceph-devel or the ceph-devel mailing list would be the right place to >> ask questions. >> >> Otherwise, it'll probably get done in the next few weeks. >> -Sam >> >> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor >> wrote: >> > thank you Sam! >> > I also noticed this linked errors during scrub... >> > >> > Now all lools like reasonable! >> > >> > So we will wait for bug to be closed. >> > >> > do you need any help on it? >> > >> > I mean i can help with coding/testing/etc... >> > >> > 2015-08-21 0:52 GMT+03:00 Samuel Just : >> >> >> >> Ah, this is kind of silly. I think you don't have 37 errors, but 2 >> >> errors. pg 2.490 object >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing >> >> snap 141. If you look at the objects after that in the log: >> >> >> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster >> >> [ERR] repair 2.490 >> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected >> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 >> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster >> >> [ERR] repair 2.490 >> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected >> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 >> >> >> >> The clone from the second line matches the head object from the >> >> previous line, and they have the same clone id. I *think* that the >> >> first error is real, and the subsequent ones are just scrub being >> >> dumb. Same deal with pg 2.c4. I just opened >> >> http://tracker.ceph.com/issues/12738. >> >> >> >> The original problem is that >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and >> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both >> >> missing a clone. Not sure how that happened, my money is on a >> >> cache/tiering evict racing with a snap trim. If you have any logging >> >> or relevant information from when that happened, you should open a >> >> bug. The 'snapdir' in the two object names indicates that the head >> >> object has actually been deleted (which makes sense if you moved the >> >> image to a new image and deleted the old one) and is only being kept >> >> around since there are live snapshots. I suggest you leave the >> >> snapshots for those images alone for the time being -- removing them >> >> might cause the osd to crash trying to clean up the wierd on disk >> >> state. Other than the leaked space from those two image snapshots and >> >> the annoying spurious scrub errors, I think no actual corruption is >> >> going on though. I created a tracker ticket for a feature that would >> >> let ceph-objectstore-tool
Re: [ceph-users] Repair inconsistent pgs..
Image? One? We start deleting images only to fix thsi (export/import)m before - 1-4 times per day (when VM destroyed)... 2015-08-21 1:44 GMT+03:00 Samuel Just : > Interesting. How often do you delete an image? I'm wondering if > whatever this is happened when you deleted these two images. > -Sam > > On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor > wrote: > > Sam, i try to understand which rbd contain this chunks.. but no luck. No > rbd > > images block names started with this... > > > >> Actually, now that I think about it, you probably didn't remove the > >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 > >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 > > > > > > > > > > 2015-08-21 1:36 GMT+03:00 Samuel Just : > >> > >> Actually, now that I think about it, you probably didn't remove the > >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 > >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but > >> other images (that's why the scrub errors went down briefly, those > >> objects -- which were fine -- went away). You might want to export > >> and reimport those two images into new images, but leave the old ones > >> alone until you can clean up the on disk state (image and snapshots) > >> and clear the scrub errors. You probably don't want to read the > >> snapshots for those images either. Everything else is, I think, > >> harmless. > >> > >> The ceph-objectstore-tool feature would probably not be too hard, > >> actually. Each head/snapdir image has two attrs (possibly stored in > >> leveldb -- that's why you want to modify the ceph-objectstore-tool and > >> use its interfaces rather than mucking about with the files directly) > >> '_' and 'snapset' which contain encoded representations of > >> object_info_t and SnapSet (both can be found in src/osd/osd_types.h). > >> SnapSet has a set of clones and related metadata -- you want to read > >> the SnapSet attr off disk and commit a transaction writing out a new > >> version with that clone removed. I'd start by cloning the repo, > >> starting a vstart cluster locally, and reproducing the issue. Next, > >> get familiar with using ceph-objectstore-tool on the osds in that > >> vstart cluster. A good first change would be creating a > >> ceph-objectstore-tool op that lets you dump json for the object_info_t > >> and SnapSet (both types have format() methods which make that easy) on > >> an object to stdout so you can confirm what's actually there. oftc > >> #ceph-devel or the ceph-devel mailing list would be the right place to > >> ask questions. > >> > >> Otherwise, it'll probably get done in the next few weeks. > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor > >> wrote: > >> > thank you Sam! > >> > I also noticed this linked errors during scrub... > >> > > >> > Now all lools like reasonable! > >> > > >> > So we will wait for bug to be closed. > >> > > >> > do you need any help on it? > >> > > >> > I mean i can help with coding/testing/etc... > >> > > >> > 2015-08-21 0:52 GMT+03:00 Samuel Just : > >> >> > >> >> Ah, this is kind of silly. I think you don't have 37 errors, but 2 > >> >> errors. pg 2.490 object > >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is > missing > >> >> snap 141. If you look at the objects after that in the log: > >> >> > >> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : > cluster > >> >> [ERR] repair 2.490 > >> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected > >> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 > >> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : > cluster > >> >> [ERR] repair 2.490 > >> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected > >> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 > >> >> > >> >> The clone from the second line matches the head object from the > >> >> previous line, and they have the same clone id. I *think* that the > >> >> first error is real, and the subsequent ones are just scrub being > >> >> dumb. Same deal with pg 2.c4. I just opened > >> >> http://tracker.ceph.com/issues/12738. > >> >> > >> >> The original problem is that > >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and > >> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both > >> >> missing a clone. Not sure how that happened, my money is on a > >> >> cache/tiering evict racing with a snap trim. If you have any logging > >> >> or relevant information from when that happened, you should open a > >> >> bug. The 'snapdir' in the two object names indicates that the head > >> >> object has actually been deleted (which makes sense if you moved the > >> >> image to a new image and deleted the old one) and is only being kept > >> >> around since there are live snapshots. I suggest you leave the > >> >> snapshots for those images alone f
Re: [ceph-users] Repair inconsistent pgs..
Ok, so images are regularly removed. In that case, these two objects probably are left over from previously removed images. Once ceph-objectstore-tool can dump the SnapSet from those two objects, you will probably find that those two snapdir objects each have only one bogus clone, in which case you'll probably just remove the images. -Sam On Thu, Aug 20, 2015 at 3:45 PM, Voloshanenko Igor wrote: > Image? One? > > We start deleting images only to fix thsi (export/import)m before - 1-4 > times per day (when VM destroyed)... > > > > 2015-08-21 1:44 GMT+03:00 Samuel Just : >> >> Interesting. How often do you delete an image? I'm wondering if >> whatever this is happened when you deleted these two images. >> -Sam >> >> On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor >> wrote: >> > Sam, i try to understand which rbd contain this chunks.. but no luck. No >> > rbd >> > images block names started with this... >> > >> >> Actually, now that I think about it, you probably didn't remove the >> >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 >> >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 >> > >> > >> > >> > >> > 2015-08-21 1:36 GMT+03:00 Samuel Just : >> >> >> >> Actually, now that I think about it, you probably didn't remove the >> >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 >> >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but >> >> other images (that's why the scrub errors went down briefly, those >> >> objects -- which were fine -- went away). You might want to export >> >> and reimport those two images into new images, but leave the old ones >> >> alone until you can clean up the on disk state (image and snapshots) >> >> and clear the scrub errors. You probably don't want to read the >> >> snapshots for those images either. Everything else is, I think, >> >> harmless. >> >> >> >> The ceph-objectstore-tool feature would probably not be too hard, >> >> actually. Each head/snapdir image has two attrs (possibly stored in >> >> leveldb -- that's why you want to modify the ceph-objectstore-tool and >> >> use its interfaces rather than mucking about with the files directly) >> >> '_' and 'snapset' which contain encoded representations of >> >> object_info_t and SnapSet (both can be found in src/osd/osd_types.h). >> >> SnapSet has a set of clones and related metadata -- you want to read >> >> the SnapSet attr off disk and commit a transaction writing out a new >> >> version with that clone removed. I'd start by cloning the repo, >> >> starting a vstart cluster locally, and reproducing the issue. Next, >> >> get familiar with using ceph-objectstore-tool on the osds in that >> >> vstart cluster. A good first change would be creating a >> >> ceph-objectstore-tool op that lets you dump json for the object_info_t >> >> and SnapSet (both types have format() methods which make that easy) on >> >> an object to stdout so you can confirm what's actually there. oftc >> >> #ceph-devel or the ceph-devel mailing list would be the right place to >> >> ask questions. >> >> >> >> Otherwise, it'll probably get done in the next few weeks. >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor >> >> wrote: >> >> > thank you Sam! >> >> > I also noticed this linked errors during scrub... >> >> > >> >> > Now all lools like reasonable! >> >> > >> >> > So we will wait for bug to be closed. >> >> > >> >> > do you need any help on it? >> >> > >> >> > I mean i can help with coding/testing/etc... >> >> > >> >> > 2015-08-21 0:52 GMT+03:00 Samuel Just : >> >> >> >> >> >> Ah, this is kind of silly. I think you don't have 37 errors, but 2 >> >> >> errors. pg 2.490 object >> >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is >> >> >> missing >> >> >> snap 141. If you look at the objects after that in the log: >> >> >> >> >> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : >> >> >> cluster >> >> >> [ERR] repair 2.490 >> >> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected >> >> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 >> >> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : >> >> >> cluster >> >> >> [ERR] repair 2.490 >> >> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected >> >> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 >> >> >> >> >> >> The clone from the second line matches the head object from the >> >> >> previous line, and they have the same clone id. I *think* that the >> >> >> first error is real, and the subsequent ones are just scrub being >> >> >> dumb. Same deal with pg 2.c4. I just opened >> >> >> http://tracker.ceph.com/issues/12738. >> >> >> >> >> >> The original problem is that >> >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and >> >> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both >> >> >> missing a clone. Not sure how
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic wrote: > This was related to the caching layer, which doesnt support snapshooting per > docs...for sake of closing the thread. > > On 17 August 2015 at 21:15, Voloshanenko Igor > wrote: >> >> Hi all, can you please help me with unexplained situation... >> >> All snapshot inside ceph broken... >> >> So, as example, we have VM template, as rbd inside ceph. >> We can map it and mount to check that all ok with it >> >> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >> /dev/rbd0 >> root@test:~# parted /dev/rbd0 print >> Model: Unknown (unknown) >> Disk /dev/rbd0: 10.7GB >> Sector size (logical/physical): 512B/512B >> Partition Table: msdos >> >> Number Start End SizeType File system Flags >> 1 1049kB 525MB 524MB primary ext4 boot >> 2 525MB 10.7GB 10.2GB primary lvm >> >> Than i want to create snap, so i do: >> root@test:~# rbd snap create >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >> And now i want to map it: >> >> root@test:~# rbd map >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> /dev/rbd1 >> root@test:~# parted /dev/rbd1 print >> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >> /dev/rbd1 has been opened read-only. >> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >> /dev/rbd1 has been opened read-only. >> Error: /dev/rbd1: unrecognised disk label >> >> Even md5 different... >> root@ix-s2:~# md5sum /dev/rbd0 >> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >> root@ix-s2:~# md5sum /dev/rbd1 >> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >> >> Ok, now i protect snap and create clone... but same thing... >> md5 for clone same as for snap,, >> >> root@test:~# rbd unmap /dev/rbd1 >> root@test:~# rbd snap protect >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> root@test:~# rbd clone >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> cold-storage/test-image >> root@test:~# rbd map cold-storage/test-image >> /dev/rbd1 >> root@test:~# md5sum /dev/rbd1 >> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >> but it's broken... >> root@test:~# parted /dev/rbd1 print >> Error: /dev/rbd1: unrecognised disk label >> >> >> = >> >> tech details: >> >> root@test:~# ceph -v >> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >> >> We have 2 inconstistent pgs, but all images not placed on this pgs... >> >> root@test:~# ceph health detail >> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> 18 scrub errors >> >> >> >> root@test:~# ceph osd map cold-storage >> 0e23c701-401d-4465-b9b4-c02939d57bb5 >> osdmap e16770 pool 'cold-storage' (2) object >> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up >> ([37,15,14], p37) acting ([37,15,14], p37) >> root@test:~# ceph osd map cold-storage >> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap >> osdmap e16770 pool 'cold-storage' (2) object >> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) -> up >> ([12,23,17], p12) acting ([12,23,17], p12) >> root@test:~# ceph osd map cold-storage >> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image >> osdmap e16770 pool 'cold-storage' (2) object >> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 (2.2a9) >> -> up ([12,44,23], p12) acting ([12,44,23], p12) >> >> >> Also we use cache layer, which in current moment - in forward mode... >> >> Can you please help me with this.. As my brain stop to understand what is >> going on... >> >> Thank in advance! >> >> >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > > Andrija Panić > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: > Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? > -Sam > > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > wrote: >> This was related to the caching layer, which doesnt support snapshooting per >> docs...for sake of closing the thread. >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> wrote: >>> >>> Hi all, can you please help me with unexplained situation... >>> >>> All snapshot inside ceph broken... >>> >>> So, as example, we have VM template, as rbd inside ceph. >>> We can map it and mount to check that all ok with it >>> >>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >>> /dev/rbd0 >>> root@test:~# parted /dev/rbd0 print >>> Model: Unknown (unknown) >>> Disk /dev/rbd0: 10.7GB >>> Sector size (logical/physical): 512B/512B >>> Partition Table: msdos >>> >>> Number Start End SizeType File system Flags >>> 1 1049kB 525MB 524MB primary ext4 boot >>> 2 525MB 10.7GB 10.2GB primary lvm >>> >>> Than i want to create snap, so i do: >>> root@test:~# rbd snap create >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> And now i want to map it: >>> >>> root@test:~# rbd map >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> /dev/rbd1 >>> root@test:~# parted /dev/rbd1 print >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >>> /dev/rbd1 has been opened read-only. >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >>> /dev/rbd1 has been opened read-only. >>> Error: /dev/rbd1: unrecognised disk label >>> >>> Even md5 different... >>> root@ix-s2:~# md5sum /dev/rbd0 >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >>> root@ix-s2:~# md5sum /dev/rbd1 >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> >>> Ok, now i protect snap and create clone... but same thing... >>> md5 for clone same as for snap,, >>> >>> root@test:~# rbd unmap /dev/rbd1 >>> root@test:~# rbd snap protect >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> root@test:~# rbd clone >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> cold-storage/test-image >>> root@test:~# rbd map cold-storage/test-image >>> /dev/rbd1 >>> root@test:~# md5sum /dev/rbd1 >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> but it's broken... >>> root@test:~# parted /dev/rbd1 print >>> Error: /dev/rbd1: unrecognised disk label >>> >>> >>> = >>> >>> tech details: >>> >>> root@test:~# ceph -v >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >>> >>> We have 2 inconstistent pgs, but all images not placed on this pgs... >>> >>> root@test:~# ceph health detail >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >>> 18 scrub errors >>> >>> >>> >>> root@test:~# ceph osd map cold-storage >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 >>> osdmap e16770 pool 'cold-storage' (2) object >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up >>> ([37,15,14], p37) acting ([37,15,14], p37) >>> root@test:~# ceph osd map cold-storage >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap >>> osdmap e16770 pool 'cold-storage' (2) object >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) -> up >>> ([12,23,17], p12) acting ([12,23,17], p12) >>> root@test:~# ceph osd map cold-storage >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image >>> osdmap e16770 pool 'cold-storage' (2) object >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 (2.2a9) >>> -> up ([12,44,23], p12) acting ([12,44,23], p12) >>> >>> >>> Also we use cache layer, which in current moment - in forward mode... >>> >>> Can you please help me with this.. As my brain stop to understand what is >>> going on... >>> >>> Thank in advance! >>> >>> >>> >>> >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> >> -- >> >> Andrija Panić >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Yes, will do. What we see. When cache tier in forward mod, if i did rbd snap create - it's use rbd_header not from cold tier, but from hot-tier, butm this 2 headers not synced And can;t be evicted from hot-storage, as it;s locked by KVM (Qemu). If i kill lock, evict header - all start to work.. But it's unacceptable for production... To kill lock during running VM ((( 2015-08-21 1:51 GMT+03:00 Samuel Just : > Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? > -Sam > > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > wrote: > > This was related to the caching layer, which doesnt support snapshooting > per > > docs...for sake of closing the thread. > > > > On 17 August 2015 at 21:15, Voloshanenko Igor < > igor.voloshane...@gmail.com> > > wrote: > >> > >> Hi all, can you please help me with unexplained situation... > >> > >> All snapshot inside ceph broken... > >> > >> So, as example, we have VM template, as rbd inside ceph. > >> We can map it and mount to check that all ok with it > >> > >> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >> /dev/rbd0 > >> root@test:~# parted /dev/rbd0 print > >> Model: Unknown (unknown) > >> Disk /dev/rbd0: 10.7GB > >> Sector size (logical/physical): 512B/512B > >> Partition Table: msdos > >> > >> Number Start End SizeType File system Flags > >> 1 1049kB 525MB 524MB primary ext4 boot > >> 2 525MB 10.7GB 10.2GB primary lvm > >> > >> Than i want to create snap, so i do: > >> root@test:~# rbd snap create > >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> > >> And now i want to map it: > >> > >> root@test:~# rbd map > >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> /dev/rbd1 > >> root@test:~# parted /dev/rbd1 print > >> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). > >> /dev/rbd1 has been opened read-only. > >> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). > >> /dev/rbd1 has been opened read-only. > >> Error: /dev/rbd1: unrecognised disk label > >> > >> Even md5 different... > >> root@ix-s2:~# md5sum /dev/rbd0 > >> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >> root@ix-s2:~# md5sum /dev/rbd1 > >> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >> > >> > >> Ok, now i protect snap and create clone... but same thing... > >> md5 for clone same as for snap,, > >> > >> root@test:~# rbd unmap /dev/rbd1 > >> root@test:~# rbd snap protect > >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> root@test:~# rbd clone > >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> cold-storage/test-image > >> root@test:~# rbd map cold-storage/test-image > >> /dev/rbd1 > >> root@test:~# md5sum /dev/rbd1 > >> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >> > >> but it's broken... > >> root@test:~# parted /dev/rbd1 print > >> Error: /dev/rbd1: unrecognised disk label > >> > >> > >> = > >> > >> tech details: > >> > >> root@test:~# ceph -v > >> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > >> > >> We have 2 inconstistent pgs, but all images not placed on this pgs... > >> > >> root@test:~# ceph health detail > >> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> 18 scrub errors > >> > >> > >> > >> root@test:~# ceph osd map cold-storage > >> 0e23c701-401d-4465-b9b4-c02939d57bb5 > >> osdmap e16770 pool 'cold-storage' (2) object > >> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up > >> ([37,15,14], p37) acting ([37,15,14], p37) > >> root@test:~# ceph osd map cold-storage > >> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap > >> osdmap e16770 pool 'cold-storage' (2) object > >> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) > -> up > >> ([12,23,17], p12) acting ([12,23,17], p12) > >> root@test:~# ceph osd map cold-storage > >> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image > >> osdmap e16770 pool 'cold-storage' (2) object > >> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 > (2.2a9) > >> -> up ([12,44,23], p12) acting ([12,44,23], p12) > >> > >> > >> Also we use cache layer, which in current moment - in forward mode... > >> > >> Can you please help me with this.. As my brain stop to understand what > is > >> going on... > >> > >> Thank in advance! > >> > >> > >> > >> > >> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > > > -- > > > > Andrija Panić > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just : > Also, can you include the kernel version? > -Sam > > On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: > > Snapshotting with cache/tiering *is* supposed to work. Can you open a > bug? > > -Sam > > > > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > wrote: > >> This was related to the caching layer, which doesnt support > snapshooting per > >> docs...for sake of closing the thread. > >> > >> On 17 August 2015 at 21:15, Voloshanenko Igor < > igor.voloshane...@gmail.com> > >> wrote: > >>> > >>> Hi all, can you please help me with unexplained situation... > >>> > >>> All snapshot inside ceph broken... > >>> > >>> So, as example, we have VM template, as rbd inside ceph. > >>> We can map it and mount to check that all ok with it > >>> > >>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> /dev/rbd0 > >>> root@test:~# parted /dev/rbd0 print > >>> Model: Unknown (unknown) > >>> Disk /dev/rbd0: 10.7GB > >>> Sector size (logical/physical): 512B/512B > >>> Partition Table: msdos > >>> > >>> Number Start End SizeType File system Flags > >>> 1 1049kB 525MB 524MB primary ext4 boot > >>> 2 525MB 10.7GB 10.2GB primary lvm > >>> > >>> Than i want to create snap, so i do: > >>> root@test:~# rbd snap create > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> > >>> And now i want to map it: > >>> > >>> root@test:~# rbd map > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> /dev/rbd1 > >>> root@test:~# parted /dev/rbd1 print > >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). > >>> /dev/rbd1 has been opened read-only. > >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). > >>> /dev/rbd1 has been opened read-only. > >>> Error: /dev/rbd1: unrecognised disk label > >>> > >>> Even md5 different... > >>> root@ix-s2:~# md5sum /dev/rbd0 > >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >>> root@ix-s2:~# md5sum /dev/rbd1 > >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> > >>> > >>> Ok, now i protect snap and create clone... but same thing... > >>> md5 for clone same as for snap,, > >>> > >>> root@test:~# rbd unmap /dev/rbd1 > >>> root@test:~# rbd snap protect > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> root@test:~# rbd clone > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> cold-storage/test-image > >>> root@test:~# rbd map cold-storage/test-image > >>> /dev/rbd1 > >>> root@test:~# md5sum /dev/rbd1 > >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> > >>> but it's broken... > >>> root@test:~# parted /dev/rbd1 print > >>> Error: /dev/rbd1: unrecognised disk label > >>> > >>> > >>> = > >>> > >>> tech details: > >>> > >>> root@test:~# ceph -v > >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > >>> > >>> We have 2 inconstistent pgs, but all images not placed on this pgs... > >>> > >>> root@test:~# ceph health detail > >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >>> 18 scrub errors > >>> > >>> > >>> > >>> root@test:~# ceph osd map cold-storage > >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> osdmap e16770 pool 'cold-storage' (2) object > >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up > >>> ([37,15,14], p37) acting ([37,15,14], p37) > >>> root@test:~# ceph osd map cold-storage > >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap > >>> osdmap e16770 pool 'cold-storage' (2) object > >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) > -> up > >>> ([12,23,17], p12) acting ([12,23,17], p12) > >>> root@test:~# ceph osd map cold-storage > >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image > >>> osdmap e16770 pool 'cold-storage' (2) object > >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 > (2.2a9) > >>> -> up ([12,44,23], p12) acting ([12,44,23], p12) > >>> > >>> > >>> Also we use cache layer, which in current moment - in forward mode... > >>> > >>> Can you please help me with this.. As my brain stop to understand what > is > >>> going on... > >>> > >>> Thank in advance! > >>> > >>> > >>> > >>> > >>> > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >> > >> > >> > >> -- > >> > >> Andrija Panić > >> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > ___ ceph-users mailing list ceph-users@lists.c
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Hmm, that might actually be client side. Can you attempt to reproduce with rbd-fuse (different client side implementation from the kernel)? -Sam On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor wrote: > root@test:~# uname -a > Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC > 2015 x86_64 x86_64 x86_64 GNU/Linux > > 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> Also, can you include the kernel version? >> -Sam >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: >> > Snapshotting with cache/tiering *is* supposed to work. Can you open a >> > bug? >> > -Sam >> > >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> > wrote: >> >> This was related to the caching layer, which doesnt support >> >> snapshooting per >> >> docs...for sake of closing the thread. >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> >> >> wrote: >> >>> >> >>> Hi all, can you please help me with unexplained situation... >> >>> >> >>> All snapshot inside ceph broken... >> >>> >> >>> So, as example, we have VM template, as rbd inside ceph. >> >>> We can map it and mount to check that all ok with it >> >>> >> >>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >> >>> /dev/rbd0 >> >>> root@test:~# parted /dev/rbd0 print >> >>> Model: Unknown (unknown) >> >>> Disk /dev/rbd0: 10.7GB >> >>> Sector size (logical/physical): 512B/512B >> >>> Partition Table: msdos >> >>> >> >>> Number Start End SizeType File system Flags >> >>> 1 1049kB 525MB 524MB primary ext4 boot >> >>> 2 525MB 10.7GB 10.2GB primary lvm >> >>> >> >>> Than i want to create snap, so i do: >> >>> root@test:~# rbd snap create >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> >> >>> And now i want to map it: >> >>> >> >>> root@test:~# rbd map >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> /dev/rbd1 >> >>> root@test:~# parted /dev/rbd1 print >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >> >>> /dev/rbd1 has been opened read-only. >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >> >>> /dev/rbd1 has been opened read-only. >> >>> Error: /dev/rbd1: unrecognised disk label >> >>> >> >>> Even md5 different... >> >>> root@ix-s2:~# md5sum /dev/rbd0 >> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >> >>> root@ix-s2:~# md5sum /dev/rbd1 >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >>> >> >>> >> >>> Ok, now i protect snap and create clone... but same thing... >> >>> md5 for clone same as for snap,, >> >>> >> >>> root@test:~# rbd unmap /dev/rbd1 >> >>> root@test:~# rbd snap protect >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> root@test:~# rbd clone >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> cold-storage/test-image >> >>> root@test:~# rbd map cold-storage/test-image >> >>> /dev/rbd1 >> >>> root@test:~# md5sum /dev/rbd1 >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >>> >> >>> but it's broken... >> >>> root@test:~# parted /dev/rbd1 print >> >>> Error: /dev/rbd1: unrecognised disk label >> >>> >> >>> >> >>> = >> >>> >> >>> tech details: >> >>> >> >>> root@test:~# ceph -v >> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >> >>> >> >>> We have 2 inconstistent pgs, but all images not placed on this pgs... >> >>> >> >>> root@test:~# ceph health detail >> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> >>> 18 scrub errors >> >>> >> >>> >> >>> >> >>> root@test:~# ceph osd map cold-storage >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 >> >>> osdmap e16770 pool 'cold-storage' (2) object >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up >> >>> ([37,15,14], p37) acting ([37,15,14], p37) >> >>> root@test:~# ceph osd map cold-storage >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap >> >>> osdmap e16770 pool 'cold-storage' (2) object >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) >> >>> -> up >> >>> ([12,23,17], p12) acting ([12,23,17], p12) >> >>> root@test:~# ceph osd map cold-storage >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image >> >>> osdmap e16770 pool 'cold-storage' (2) object >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 >> >>> (2.2a9) >> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12) >> >>> >> >>> >> >>> Also we use cache layer, which in current moment - in forward mode... >> >>> >> >>> Can you please help me with this.. As my brain stop to understand what >> >>> is >> >>> going on... >> >>> >> >>> Thank in advance! >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> ___ >> >>> ceph-users mailing list >> >>> ceph-users@lists.ceph.com >> >>> http://lists.ceph.com/listinfo.cgi/ceph-use
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
We used 4.x branch, as we have "very good" Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor : > root@test:~# uname -a > Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC > 2015 x86_64 x86_64 x86_64 GNU/Linux > > 2015-08-21 1:54 GMT+03:00 Samuel Just : > >> Also, can you include the kernel version? >> -Sam >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: >> > Snapshotting with cache/tiering *is* supposed to work. Can you open a >> bug? >> > -Sam >> > >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> wrote: >> >> This was related to the caching layer, which doesnt support >> snapshooting per >> >> docs...for sake of closing the thread. >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor < >> igor.voloshane...@gmail.com> >> >> wrote: >> >>> >> >>> Hi all, can you please help me with unexplained situation... >> >>> >> >>> All snapshot inside ceph broken... >> >>> >> >>> So, as example, we have VM template, as rbd inside ceph. >> >>> We can map it and mount to check that all ok with it >> >>> >> >>> root@test:~# rbd map >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >> >>> /dev/rbd0 >> >>> root@test:~# parted /dev/rbd0 print >> >>> Model: Unknown (unknown) >> >>> Disk /dev/rbd0: 10.7GB >> >>> Sector size (logical/physical): 512B/512B >> >>> Partition Table: msdos >> >>> >> >>> Number Start End SizeType File system Flags >> >>> 1 1049kB 525MB 524MB primary ext4 boot >> >>> 2 525MB 10.7GB 10.2GB primary lvm >> >>> >> >>> Than i want to create snap, so i do: >> >>> root@test:~# rbd snap create >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> >> >>> And now i want to map it: >> >>> >> >>> root@test:~# rbd map >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> /dev/rbd1 >> >>> root@test:~# parted /dev/rbd1 print >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >> >>> /dev/rbd1 has been opened read-only. >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >> >>> /dev/rbd1 has been opened read-only. >> >>> Error: /dev/rbd1: unrecognised disk label >> >>> >> >>> Even md5 different... >> >>> root@ix-s2:~# md5sum /dev/rbd0 >> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >> >>> root@ix-s2:~# md5sum /dev/rbd1 >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >>> >> >>> >> >>> Ok, now i protect snap and create clone... but same thing... >> >>> md5 for clone same as for snap,, >> >>> >> >>> root@test:~# rbd unmap /dev/rbd1 >> >>> root@test:~# rbd snap protect >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> root@test:~# rbd clone >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> cold-storage/test-image >> >>> root@test:~# rbd map cold-storage/test-image >> >>> /dev/rbd1 >> >>> root@test:~# md5sum /dev/rbd1 >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >>> >> >>> but it's broken... >> >>> root@test:~# parted /dev/rbd1 print >> >>> Error: /dev/rbd1: unrecognised disk label >> >>> >> >>> >> >>> = >> >>> >> >>> tech details: >> >>> >> >>> root@test:~# ceph -v >> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >> >>> >> >>> We have 2 inconstistent pgs, but all images not placed on this pgs... >> >>> >> >>> root@test:~# ceph health detail >> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] >> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >> >>> 18 scrub errors >> >>> >> >>> >> >>> >> >>> root@test:~# ceph osd map cold-storage >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 >> >>> osdmap e16770 pool 'cold-storage' (2) object >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up >> >>> ([37,15,14], p37) acting ([37,15,14], p37) >> >>> root@test:~# ceph osd map cold-storage >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap >> >>> osdmap e16770 pool 'cold-storage' (2) object >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) >> -> up >> >>> ([12,23,17], p12) acting ([12,23,17], p12) >> >>> root@test:~# ceph osd map cold-storage >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image >> >>> osdmap e16770 pool 'cold-storage' (2) object >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 >> (2.2a9) >> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12) >> >>> >> >>> >> >>> Also we use cache layer, which in current moment - in forward mode... >> >>> >> >>> Can you please help me with this.. As my brain stop to understand >> what is >> >>> going on... >> >>> >> >>> Thank in advance! >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> ___
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
I already kill cache layer, but will try to reproduce on lab 2015-08-21 1:58 GMT+03:00 Samuel Just : > Hmm, that might actually be client side. Can you attempt to reproduce > with rbd-fuse (different client side implementation from the kernel)? > -Sam > > On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor > wrote: > > root@test:~# uname -a > > Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 > UTC > > 2015 x86_64 x86_64 x86_64 GNU/Linux > > > > 2015-08-21 1:54 GMT+03:00 Samuel Just : > >> > >> Also, can you include the kernel version? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: > >> > Snapshotting with cache/tiering *is* supposed to work. Can you open a > >> > bug? > >> > -Sam > >> > > >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic < > andrija.pa...@gmail.com> > >> > wrote: > >> >> This was related to the caching layer, which doesnt support > >> >> snapshooting per > >> >> docs...for sake of closing the thread. > >> >> > >> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >> >> > >> >> wrote: > >> >>> > >> >>> Hi all, can you please help me with unexplained situation... > >> >>> > >> >>> All snapshot inside ceph broken... > >> >>> > >> >>> So, as example, we have VM template, as rbd inside ceph. > >> >>> We can map it and mount to check that all ok with it > >> >>> > >> >>> root@test:~# rbd map > cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >> >>> /dev/rbd0 > >> >>> root@test:~# parted /dev/rbd0 print > >> >>> Model: Unknown (unknown) > >> >>> Disk /dev/rbd0: 10.7GB > >> >>> Sector size (logical/physical): 512B/512B > >> >>> Partition Table: msdos > >> >>> > >> >>> Number Start End SizeType File system Flags > >> >>> 1 1049kB 525MB 524MB primary ext4 boot > >> >>> 2 525MB 10.7GB 10.2GB primary lvm > >> >>> > >> >>> Than i want to create snap, so i do: > >> >>> root@test:~# rbd snap create > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> > >> >>> And now i want to map it: > >> >>> > >> >>> root@test:~# rbd map > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> /dev/rbd1 > >> >>> root@test:~# parted /dev/rbd1 print > >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > system). > >> >>> /dev/rbd1 has been opened read-only. > >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > system). > >> >>> /dev/rbd1 has been opened read-only. > >> >>> Error: /dev/rbd1: unrecognised disk label > >> >>> > >> >>> Even md5 different... > >> >>> root@ix-s2:~# md5sum /dev/rbd0 > >> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >> >>> root@ix-s2:~# md5sum /dev/rbd1 > >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >> >>> > >> >>> > >> >>> Ok, now i protect snap and create clone... but same thing... > >> >>> md5 for clone same as for snap,, > >> >>> > >> >>> root@test:~# rbd unmap /dev/rbd1 > >> >>> root@test:~# rbd snap protect > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> root@test:~# rbd clone > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> cold-storage/test-image > >> >>> root@test:~# rbd map cold-storage/test-image > >> >>> /dev/rbd1 > >> >>> root@test:~# md5sum /dev/rbd1 > >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >> >>> > >> >>> but it's broken... > >> >>> root@test:~# parted /dev/rbd1 print > >> >>> Error: /dev/rbd1: unrecognised disk label > >> >>> > >> >>> > >> >>> = > >> >>> > >> >>> tech details: > >> >>> > >> >>> root@test:~# ceph -v > >> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > >> >>> > >> >>> We have 2 inconstistent pgs, but all images not placed on this > pgs... > >> >>> > >> >>> root@test:~# ceph health detail > >> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >> >>> 18 scrub errors > >> >>> > >> >>> > >> >>> > >> >>> root@test:~# ceph osd map cold-storage > >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 > >> >>> osdmap e16770 pool 'cold-storage' (2) object > >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> > up > >> >>> ([37,15,14], p37) acting ([37,15,14], p37) > >> >>> root@test:~# ceph osd map cold-storage > >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap > >> >>> osdmap e16770 pool 'cold-storage' (2) object > >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 > (2.4a3) > >> >>> -> up > >> >>> ([12,23,17], p12) acting ([12,23,17], p12) > >> >>> root@test:~# ceph osd map cold-storage > >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image > >> >>> osdmap e16770 pool 'cold-storage' (2) object > >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 > >> >>> (2.2a9) > >> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12) > >> >>> > >> >>> > >> >>> Also we use cache la
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor wrote: > We used 4.x branch, as we have "very good" Samsung 850 pro in production, > and they don;t support ncq_trim... > > And 4,x first branch which include exceptions for this in libsata.c. > > sure we can backport this 1 line to 3.x branch, but we prefer no to go > deeper if packege for new kernel exist. > > 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor : >> >> root@test:~# uname -a >> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC >> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> 2015-08-21 1:54 GMT+03:00 Samuel Just : >>> >>> Also, can you include the kernel version? >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: >>> > Snapshotting with cache/tiering *is* supposed to work. Can you open a >>> > bug? >>> > -Sam >>> > >>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >>> > wrote: >>> >> This was related to the caching layer, which doesnt support >>> >> snapshooting per >>> >> docs...for sake of closing the thread. >>> >> >>> >> On 17 August 2015 at 21:15, Voloshanenko Igor >>> >> >>> >> wrote: >>> >>> >>> >>> Hi all, can you please help me with unexplained situation... >>> >>> >>> >>> All snapshot inside ceph broken... >>> >>> >>> >>> So, as example, we have VM template, as rbd inside ceph. >>> >>> We can map it and mount to check that all ok with it >>> >>> >>> >>> root@test:~# rbd map >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >>> >>> /dev/rbd0 >>> >>> root@test:~# parted /dev/rbd0 print >>> >>> Model: Unknown (unknown) >>> >>> Disk /dev/rbd0: 10.7GB >>> >>> Sector size (logical/physical): 512B/512B >>> >>> Partition Table: msdos >>> >>> >>> >>> Number Start End SizeType File system Flags >>> >>> 1 1049kB 525MB 524MB primary ext4 boot >>> >>> 2 525MB 10.7GB 10.2GB primary lvm >>> >>> >>> >>> Than i want to create snap, so i do: >>> >>> root@test:~# rbd snap create >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> >>> >>> And now i want to map it: >>> >>> >>> >>> root@test:~# rbd map >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> /dev/rbd1 >>> >>> root@test:~# parted /dev/rbd1 print >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >>> >>> /dev/rbd1 has been opened read-only. >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >>> >>> /dev/rbd1 has been opened read-only. >>> >>> Error: /dev/rbd1: unrecognised disk label >>> >>> >>> >>> Even md5 different... >>> >>> root@ix-s2:~# md5sum /dev/rbd0 >>> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >>> >>> root@ix-s2:~# md5sum /dev/rbd1 >>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> >>> >>> >>> >>> Ok, now i protect snap and create clone... but same thing... >>> >>> md5 for clone same as for snap,, >>> >>> >>> >>> root@test:~# rbd unmap /dev/rbd1 >>> >>> root@test:~# rbd snap protect >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> root@test:~# rbd clone >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> cold-storage/test-image >>> >>> root@test:~# rbd map cold-storage/test-image >>> >>> /dev/rbd1 >>> >>> root@test:~# md5sum /dev/rbd1 >>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> >>> >>> but it's broken... >>> >>> root@test:~# parted /dev/rbd1 print >>> >>> Error: /dev/rbd1: unrecognised disk label >>> >>> >>> >>> >>> >>> = >>> >>> >>> >>> tech details: >>> >>> >>> >>> root@test:~# ceph -v >>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >>> >>> >>> >>> We have 2 inconstistent pgs, but all images not placed on this pgs... >>> >>> >>> >>> root@test:~# ceph health detail >>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] >>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >>> >>> 18 scrub errors >>> >>> >>> >>> >>> >>> >>> >>> root@test:~# ceph osd map cold-storage >>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 >>> >>> osdmap e16770 pool 'cold-storage' (2) object >>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up >>> >>> ([37,15,14], p37) acting ([37,15,14], p37) >>> >>> root@test:~# ceph osd map cold-storage >>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap >>> >>> osdmap e16770 pool 'cold-storage' (2) object >>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) >>> >>> -> up >>> >>> ([12,23,17], p12) acting ([12,23,17], p12) >>> >>> root@test:~# ceph osd map cold-
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: > What's supposed to happen is that the client transparently directs all > requests to the cache pool rather than the cold pool when there is a > cache pool. If the kernel is sending requests to the cold pool, > that's probably where the bug is. Odd. It could also be a bug > specific 'forward' mode either in the client or on the osd. Why did > you have it in that mode? > -Sam > > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > wrote: >> We used 4.x branch, as we have "very good" Samsung 850 pro in production, >> and they don;t support ncq_trim... >> >> And 4,x first branch which include exceptions for this in libsata.c. >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to go >> deeper if packege for new kernel exist. >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor : >>> >>> root@test:~# uname -a >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >>> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: > Snapshotting with cache/tiering *is* supposed to work. Can you open a > bug? > -Sam > > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > wrote: >> This was related to the caching layer, which doesnt support >> snapshooting per >> docs...for sake of closing the thread. >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> wrote: >>> >>> Hi all, can you please help me with unexplained situation... >>> >>> All snapshot inside ceph broken... >>> >>> So, as example, we have VM template, as rbd inside ceph. >>> We can map it and mount to check that all ok with it >>> >>> root@test:~# rbd map >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >>> /dev/rbd0 >>> root@test:~# parted /dev/rbd0 print >>> Model: Unknown (unknown) >>> Disk /dev/rbd0: 10.7GB >>> Sector size (logical/physical): 512B/512B >>> Partition Table: msdos >>> >>> Number Start End SizeType File system Flags >>> 1 1049kB 525MB 524MB primary ext4 boot >>> 2 525MB 10.7GB 10.2GB primary lvm >>> >>> Than i want to create snap, so i do: >>> root@test:~# rbd snap create >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> And now i want to map it: >>> >>> root@test:~# rbd map >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> /dev/rbd1 >>> root@test:~# parted /dev/rbd1 print >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >>> /dev/rbd1 has been opened read-only. >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system). >>> /dev/rbd1 has been opened read-only. >>> Error: /dev/rbd1: unrecognised disk label >>> >>> Even md5 different... >>> root@ix-s2:~# md5sum /dev/rbd0 >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >>> root@ix-s2:~# md5sum /dev/rbd1 >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> >>> Ok, now i protect snap and create clone... but same thing... >>> md5 for clone same as for snap,, >>> >>> root@test:~# rbd unmap /dev/rbd1 >>> root@test:~# rbd snap protect >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> root@test:~# rbd clone >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> cold-storage/test-image >>> root@test:~# rbd map cold-storage/test-image >>> /dev/rbd1 >>> root@test:~# md5sum /dev/rbd1 >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> but it's broken... >>> root@test:~# parted /dev/rbd1 print >>> Error: /dev/rbd1: unrecognised disk label >>> >>> >>> = >>> >>> tech details: >>> >>> root@test:~# ceph -v >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >>> >>> We have 2 inconstistent pgs, but all images not placed on this pgs... >>> >>> root@test:~# ceph health detail >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >>> 18 scrub errors >>> >>> >>> >>> root@test:~# ceph osd map cold-storage >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 >>> osdmap e16770 pool 'cold-storage' (2) object >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up >>> ([37,15,14], p37) acting ([37,15,14], p37) >>> root@test:~# ceph osd map cold-storage >>> 0e23c701-401
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
We switch to forward mode as step to switch cache layer off. Right now we have "samsung 850 pro" in cache layer (10 ssd, 2 per nodes) and they show 2MB for 4K blocks... 250 IOPS... intead of 18-20K for intel S3500 240G which we choose as replacement.. So with such good disks - cache layer - very big bottleneck for us... 2015-08-21 2:02 GMT+03:00 Samuel Just : > What's supposed to happen is that the client transparently directs all > requests to the cache pool rather than the cold pool when there is a > cache pool. If the kernel is sending requests to the cold pool, > that's probably where the bug is. Odd. It could also be a bug > specific 'forward' mode either in the client or on the osd. Why did > you have it in that mode? > -Sam > > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > wrote: > > We used 4.x branch, as we have "very good" Samsung 850 pro in production, > > and they don;t support ncq_trim... > > > > And 4,x first branch which include exceptions for this in libsata.c. > > > > sure we can backport this 1 line to 3.x branch, but we prefer no to go > > deeper if packege for new kernel exist. > > > > 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >: > >> > >> root@test:~# uname -a > >> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 > UTC > >> 2015 x86_64 x86_64 x86_64 GNU/Linux > >> > >> 2015-08-21 1:54 GMT+03:00 Samuel Just : > >>> > >>> Also, can you include the kernel version? > >>> -Sam > >>> > >>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just wrote: > >>> > Snapshotting with cache/tiering *is* supposed to work. Can you open > a > >>> > bug? > >>> > -Sam > >>> > > >>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >>> > wrote: > >>> >> This was related to the caching layer, which doesnt support > >>> >> snapshooting per > >>> >> docs...for sake of closing the thread. > >>> >> > >>> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >>> >> > >>> >> wrote: > >>> >>> > >>> >>> Hi all, can you please help me with unexplained situation... > >>> >>> > >>> >>> All snapshot inside ceph broken... > >>> >>> > >>> >>> So, as example, we have VM template, as rbd inside ceph. > >>> >>> We can map it and mount to check that all ok with it > >>> >>> > >>> >>> root@test:~# rbd map > >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> >>> /dev/rbd0 > >>> >>> root@test:~# parted /dev/rbd0 print > >>> >>> Model: Unknown (unknown) > >>> >>> Disk /dev/rbd0: 10.7GB > >>> >>> Sector size (logical/physical): 512B/512B > >>> >>> Partition Table: msdos > >>> >>> > >>> >>> Number Start End SizeType File system Flags > >>> >>> 1 1049kB 525MB 524MB primary ext4 boot > >>> >>> 2 525MB 10.7GB 10.2GB primary lvm > >>> >>> > >>> >>> Than i want to create snap, so i do: > >>> >>> root@test:~# rbd snap create > >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >>> > >>> >>> And now i want to map it: > >>> >>> > >>> >>> root@test:~# rbd map > >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >>> /dev/rbd1 > >>> >>> root@test:~# parted /dev/rbd1 print > >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > system). > >>> >>> /dev/rbd1 has been opened read-only. > >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > system). > >>> >>> /dev/rbd1 has been opened read-only. > >>> >>> Error: /dev/rbd1: unrecognised disk label > >>> >>> > >>> >>> Even md5 different... > >>> >>> root@ix-s2:~# md5sum /dev/rbd0 > >>> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >>> >>> root@ix-s2:~# md5sum /dev/rbd1 > >>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> >>> > >>> >>> > >>> >>> Ok, now i protect snap and create clone... but same thing... > >>> >>> md5 for clone same as for snap,, > >>> >>> > >>> >>> root@test:~# rbd unmap /dev/rbd1 > >>> >>> root@test:~# rbd snap protect > >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >>> root@test:~# rbd clone > >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >>> cold-storage/test-image > >>> >>> root@test:~# rbd map cold-storage/test-image > >>> >>> /dev/rbd1 > >>> >>> root@test:~# md5sum /dev/rbd1 > >>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> >>> > >>> >>> but it's broken... > >>> >>> root@test:~# parted /dev/rbd1 print > >>> >>> Error: /dev/rbd1: unrecognised disk label > >>> >>> > >>> >>> > >>> >>> = > >>> >>> > >>> >>> tech details: > >>> >>> > >>> >>> root@test:~# ceph -v > >>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > >>> >>> > >>> >>> We have 2 inconstistent pgs, but all images not placed on this > pgs... > >>> >>> > >>> >>> root@test:~# ceph health detail > >>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >>> >>> 18 scrub errors > >>> >>> > >>> >>> =
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just : > Certainly, don't reproduce this with a cluster you care about :). > -Sam > > On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: > > What's supposed to happen is that the client transparently directs all > > requests to the cache pool rather than the cold pool when there is a > > cache pool. If the kernel is sending requests to the cold pool, > > that's probably where the bug is. Odd. It could also be a bug > > specific 'forward' mode either in the client or on the osd. Why did > > you have it in that mode? > > -Sam > > > > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > > wrote: > >> We used 4.x branch, as we have "very good" Samsung 850 pro in > production, > >> and they don;t support ncq_trim... > >> > >> And 4,x first branch which include exceptions for this in libsata.c. > >> > >> sure we can backport this 1 line to 3.x branch, but we prefer no to go > >> deeper if packege for new kernel exist. > >> > >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor < > igor.voloshane...@gmail.com>: > >>> > >>> root@test:~# uname -a > >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 > UTC > >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >>> > >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : > > Also, can you include the kernel version? > -Sam > > On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just > wrote: > > Snapshotting with cache/tiering *is* supposed to work. Can you > open a > > bug? > > -Sam > > > > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > > wrote: > >> This was related to the caching layer, which doesnt support > >> snapshooting per > >> docs...for sake of closing the thread. > >> > >> On 17 August 2015 at 21:15, Voloshanenko Igor > >> > >> wrote: > >>> > >>> Hi all, can you please help me with unexplained situation... > >>> > >>> All snapshot inside ceph broken... > >>> > >>> So, as example, we have VM template, as rbd inside ceph. > >>> We can map it and mount to check that all ok with it > >>> > >>> root@test:~# rbd map > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> /dev/rbd0 > >>> root@test:~# parted /dev/rbd0 print > >>> Model: Unknown (unknown) > >>> Disk /dev/rbd0: 10.7GB > >>> Sector size (logical/physical): 512B/512B > >>> Partition Table: msdos > >>> > >>> Number Start End SizeType File system Flags > >>> 1 1049kB 525MB 524MB primary ext4 boot > >>> 2 525MB 10.7GB 10.2GB primary lvm > >>> > >>> Than i want to create snap, so i do: > >>> root@test:~# rbd snap create > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> > >>> And now i want to map it: > >>> > >>> root@test:~# rbd map > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> /dev/rbd1 > >>> root@test:~# parted /dev/rbd1 print > >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > system). > >>> /dev/rbd1 has been opened read-only. > >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > system). > >>> /dev/rbd1 has been opened read-only. > >>> Error: /dev/rbd1: unrecognised disk label > >>> > >>> Even md5 different... > >>> root@ix-s2:~# md5sum /dev/rbd0 > >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >>> root@ix-s2:~# md5sum /dev/rbd1 > >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> > >>> > >>> Ok, now i protect snap and create clone... but same thing... > >>> md5 for clone same as for snap,, > >>> > >>> root@test:~# rbd unmap /dev/rbd1 > >>> root@test:~# rbd snap protect > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> root@test:~# rbd clone > >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> cold-storage/test-image > >>> root@test:~# rbd map cold-storage/test-image > >>> /dev/rbd1 > >>> root@test:~# md5sum /dev/rbd1 > >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> > >>> but it's broken... > >>> root@test:~# parted /dev/rbd1 print > >>> Error: /dev/rbd1: unrecognised disk label > >>> > >>> > >>> = > >>> > >>> tech details: > >>> > >>> root@test:~# ceph -v > >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) > >>> > >>> We have 2 inconstistent pgs, but all images not placed on this > pgs... > >>> > >>> root@test:~# ceph health detail > >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >>> 18 scrub errors > >>> > >>> ===
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor wrote: > Good joke ) > > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> Certainly, don't reproduce this with a cluster you care about :). >> -Sam >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: >> > What's supposed to happen is that the client transparently directs all >> > requests to the cache pool rather than the cold pool when there is a >> > cache pool. If the kernel is sending requests to the cold pool, >> > that's probably where the bug is. Odd. It could also be a bug >> > specific 'forward' mode either in the client or on the osd. Why did >> > you have it in that mode? >> > -Sam >> > >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> > wrote: >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >> >> production, >> >> and they don;t support ncq_trim... >> >> >> >> And 4,x first branch which include exceptions for this in libsata.c. >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to go >> >> deeper if packege for new kernel exist. >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >> >> : >> >>> >> >>> root@test:~# uname -a >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 >> >>> UTC >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> Also, can you include the kernel version? >> -Sam >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >> wrote: >> > Snapshotting with cache/tiering *is* supposed to work. Can you >> > open a >> > bug? >> > -Sam >> > >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> > wrote: >> >> This was related to the caching layer, which doesnt support >> >> snapshooting per >> >> docs...for sake of closing the thread. >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> >> >> wrote: >> >>> >> >>> Hi all, can you please help me with unexplained situation... >> >>> >> >>> All snapshot inside ceph broken... >> >>> >> >>> So, as example, we have VM template, as rbd inside ceph. >> >>> We can map it and mount to check that all ok with it >> >>> >> >>> root@test:~# rbd map >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >> >>> /dev/rbd0 >> >>> root@test:~# parted /dev/rbd0 print >> >>> Model: Unknown (unknown) >> >>> Disk /dev/rbd0: 10.7GB >> >>> Sector size (logical/physical): 512B/512B >> >>> Partition Table: msdos >> >>> >> >>> Number Start End SizeType File system Flags >> >>> 1 1049kB 525MB 524MB primary ext4 boot >> >>> 2 525MB 10.7GB 10.2GB primary lvm >> >>> >> >>> Than i want to create snap, so i do: >> >>> root@test:~# rbd snap create >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> >> >>> And now i want to map it: >> >>> >> >>> root@test:~# rbd map >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> /dev/rbd1 >> >>> root@test:~# parted /dev/rbd1 print >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >> >>> system). >> >>> /dev/rbd1 has been opened read-only. >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >> >>> system). >> >>> /dev/rbd1 has been opened read-only. >> >>> Error: /dev/rbd1: unrecognised disk label >> >>> >> >>> Even md5 different... >> >>> root@ix-s2:~# md5sum /dev/rbd0 >> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >> >>> root@ix-s2:~# md5sum /dev/rbd1 >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >>> >> >>> >> >>> Ok, now i protect snap and create clone... but same thing... >> >>> md5 for clone same as for snap,, >> >>> >> >>> root@test:~# rbd unmap /dev/rbd1 >> >>> root@test:~# rbd snap protect >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> root@test:~# rbd clone >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >>> cold-storage/test-image >> >>> root@test:~# rbd map cold-storage/test-image >> >>> /dev/rbd1 >> >>> root@test:~# md5sum /dev/rbd1 >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >>> >> >>> but it's broken... >> >>> root@test:~# parted /dev/rbd1 print >> >>> Error: /dev/rbd1: unrecognised disk label >> >>> >> >>> >> >>> = >> >>> >> >>> tech details: >> >>> >> >>> root@test:~# ceph -v >> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >> >>> >> >>> We have
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Created a ticket to improve our testing here -- this appears to be a hole. http://tracker.ceph.com/issues/12742 -Sam On Thu, Aug 20, 2015 at 4:09 PM, Samuel Just wrote: > So you started draining the cache pool before you saw either the > inconsistent pgs or the anomalous snap behavior? (That is, writeback > mode was working correctly?) > -Sam > > On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > wrote: >> Good joke ) >> >> 2015-08-21 2:06 GMT+03:00 Samuel Just : >>> >>> Certainly, don't reproduce this with a cluster you care about :). >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: >>> > What's supposed to happen is that the client transparently directs all >>> > requests to the cache pool rather than the cold pool when there is a >>> > cache pool. If the kernel is sending requests to the cold pool, >>> > that's probably where the bug is. Odd. It could also be a bug >>> > specific 'forward' mode either in the client or on the osd. Why did >>> > you have it in that mode? >>> > -Sam >>> > >>> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >>> > wrote: >>> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >>> >> production, >>> >> and they don;t support ncq_trim... >>> >> >>> >> And 4,x first branch which include exceptions for this in libsata.c. >>> >> >>> >> sure we can backport this 1 line to 3.x branch, but we prefer no to go >>> >> deeper if packege for new kernel exist. >>> >> >>> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >>> >> : >>> >>> >>> >>> root@test:~# uname -a >>> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 >>> >>> UTC >>> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >>> >>> >>> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >>> >>> Also, can you include the kernel version? >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >>> wrote: >>> > Snapshotting with cache/tiering *is* supposed to work. Can you >>> > open a >>> > bug? >>> > -Sam >>> > >>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >>> > wrote: >>> >> This was related to the caching layer, which doesnt support >>> >> snapshooting per >>> >> docs...for sake of closing the thread. >>> >> >>> >> On 17 August 2015 at 21:15, Voloshanenko Igor >>> >> >>> >> wrote: >>> >>> >>> >>> Hi all, can you please help me with unexplained situation... >>> >>> >>> >>> All snapshot inside ceph broken... >>> >>> >>> >>> So, as example, we have VM template, as rbd inside ceph. >>> >>> We can map it and mount to check that all ok with it >>> >>> >>> >>> root@test:~# rbd map >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >>> >>> /dev/rbd0 >>> >>> root@test:~# parted /dev/rbd0 print >>> >>> Model: Unknown (unknown) >>> >>> Disk /dev/rbd0: 10.7GB >>> >>> Sector size (logical/physical): 512B/512B >>> >>> Partition Table: msdos >>> >>> >>> >>> Number Start End SizeType File system Flags >>> >>> 1 1049kB 525MB 524MB primary ext4 boot >>> >>> 2 525MB 10.7GB 10.2GB primary lvm >>> >>> >>> >>> Than i want to create snap, so i do: >>> >>> root@test:~# rbd snap create >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> >>> >>> And now i want to map it: >>> >>> >>> >>> root@test:~# rbd map >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> /dev/rbd1 >>> >>> root@test:~# parted /dev/rbd1 print >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >>> >>> system). >>> >>> /dev/rbd1 has been opened read-only. >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >>> >>> system). >>> >>> /dev/rbd1 has been opened read-only. >>> >>> Error: /dev/rbd1: unrecognised disk label >>> >>> >>> >>> Even md5 different... >>> >>> root@ix-s2:~# md5sum /dev/rbd0 >>> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >>> >>> root@ix-s2:~# md5sum /dev/rbd1 >>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> >>> >>> >>> >>> Ok, now i protect snap and create clone... but same thing... >>> >>> md5 for clone same as for snap,, >>> >>> >>> >>> root@test:~# rbd unmap /dev/rbd1 >>> >>> root@test:~# rbd snap protect >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> root@test:~# rbd clone >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >>> cold-storage/test-image >>> >>> root@test:~# rbd map cold-storage/test-image >>> >>> /dev/rbd1 >>> >>> root@test:~# md5sum /dev/rbd1 >>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >>> >>> >>> >>> but it's broken... >>> >>> root@test:~# parted /de
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just : > So you started draining the cache pool before you saw either the > inconsistent pgs or the anomalous snap behavior? (That is, writeback > mode was working correctly?) > -Sam > > On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > wrote: > > Good joke ) > > > > 2015-08-21 2:06 GMT+03:00 Samuel Just : > >> > >> Certainly, don't reproduce this with a cluster you care about :). > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: > >> > What's supposed to happen is that the client transparently directs all > >> > requests to the cache pool rather than the cold pool when there is a > >> > cache pool. If the kernel is sending requests to the cold pool, > >> > that's probably where the bug is. Odd. It could also be a bug > >> > specific 'forward' mode either in the client or on the osd. Why did > >> > you have it in that mode? > >> > -Sam > >> > > >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >> > wrote: > >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in > >> >> production, > >> >> and they don;t support ncq_trim... > >> >> > >> >> And 4,x first branch which include exceptions for this in libsata.c. > >> >> > >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to > go > >> >> deeper if packege for new kernel exist. > >> >> > >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >> >> : > >> >>> > >> >>> root@test:~# uname -a > >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 > 17:37:22 > >> >>> UTC > >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >> >>> > >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : > >> > >> Also, can you include the kernel version? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just > >> wrote: > >> > Snapshotting with cache/tiering *is* supposed to work. Can you > >> > open a > >> > bug? > >> > -Sam > >> > > >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >> > wrote: > >> >> This was related to the caching layer, which doesnt support > >> >> snapshooting per > >> >> docs...for sake of closing the thread. > >> >> > >> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >> >> > >> >> wrote: > >> >>> > >> >>> Hi all, can you please help me with unexplained situation... > >> >>> > >> >>> All snapshot inside ceph broken... > >> >>> > >> >>> So, as example, we have VM template, as rbd inside ceph. > >> >>> We can map it and mount to check that all ok with it > >> >>> > >> >>> root@test:~# rbd map > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >> >>> /dev/rbd0 > >> >>> root@test:~# parted /dev/rbd0 print > >> >>> Model: Unknown (unknown) > >> >>> Disk /dev/rbd0: 10.7GB > >> >>> Sector size (logical/physical): 512B/512B > >> >>> Partition Table: msdos > >> >>> > >> >>> Number Start End SizeType File system Flags > >> >>> 1 1049kB 525MB 524MB primary ext4 boot > >> >>> 2 525MB 10.7GB 10.2GB primary lvm > >> >>> > >> >>> Than i want to create snap, so i do: > >> >>> root@test:~# rbd snap create > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> > >> >>> And now i want to map it: > >> >>> > >> >>> root@test:~# rbd map > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> /dev/rbd1 > >> >>> root@test:~# parted /dev/rbd1 print > >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > >> >>> system). > >> >>> /dev/rbd1 has been opened read-only. > >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file > >> >>> system). > >> >>> /dev/rbd1 has been opened read-only. > >> >>> Error: /dev/rbd1: unrecognised disk label > >> >>> > >> >>> Even md5 different... > >> >>> root@ix-s2:~# md5sum /dev/rbd0 > >> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >> >>> root@ix-s2:~# md5sum /dev/rbd1 > >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >> >>> > >> >>> > >> >>> Ok, now i protect snap and create clone... but same thing... > >> >>> md5 for clone same as for snap,, > >> >>> > >> >>> root@test:~# rbd unmap /dev/rbd1 > >> >>> root@test:~# rbd snap protect > >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >>> root@test:~# rbd clone > >> >
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor wrote: > No, when we start draining cache - bad pgs was in place... > We have big rebalance (disk by disk - to change journal side on both > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 > pgs inconsistent... > > In writeback - yes, looks like snapshot works good. but it's stop to work in > same moment, when cache layer fulfilled with data and evict/flush started... > > > > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> So you started draining the cache pool before you saw either the >> inconsistent pgs or the anomalous snap behavior? (That is, writeback >> mode was working correctly?) >> -Sam >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> wrote: >> > Good joke ) >> > >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> >> >> Certainly, don't reproduce this with a cluster you care about :). >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: >> >> > What's supposed to happen is that the client transparently directs >> >> > all >> >> > requests to the cache pool rather than the cold pool when there is a >> >> > cache pool. If the kernel is sending requests to the cold pool, >> >> > that's probably where the bug is. Odd. It could also be a bug >> >> > specific 'forward' mode either in the client or on the osd. Why did >> >> > you have it in that mode? >> >> > -Sam >> >> > >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >> > wrote: >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >> >> >> production, >> >> >> and they don;t support ncq_trim... >> >> >> >> >> >> And 4,x first branch which include exceptions for this in libsata.c. >> >> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to >> >> >> go >> >> >> deeper if packege for new kernel exist. >> >> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >> >> >> : >> >> >>> >> >> >>> root@test:~# uname -a >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 >> >> >>> 17:37:22 >> >> >>> UTC >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> >>> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> >> >> Also, can you include the kernel version? >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >> >> wrote: >> >> > Snapshotting with cache/tiering *is* supposed to work. Can you >> >> > open a >> >> > bug? >> >> > -Sam >> >> > >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> >> > wrote: >> >> >> This was related to the caching layer, which doesnt support >> >> >> snapshooting per >> >> >> docs...for sake of closing the thread. >> >> >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> >> >> >> >> wrote: >> >> >>> >> >> >>> Hi all, can you please help me with unexplained situation... >> >> >>> >> >> >>> All snapshot inside ceph broken... >> >> >>> >> >> >>> So, as example, we have VM template, as rbd inside ceph. >> >> >>> We can map it and mount to check that all ok with it >> >> >>> >> >> >>> root@test:~# rbd map >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >> >> >>> /dev/rbd0 >> >> >>> root@test:~# parted /dev/rbd0 print >> >> >>> Model: Unknown (unknown) >> >> >>> Disk /dev/rbd0: 10.7GB >> >> >>> Sector size (logical/physical): 512B/512B >> >> >>> Partition Table: msdos >> >> >>> >> >> >>> Number Start End SizeType File system Flags >> >> >>> 1 1049kB 525MB 524MB primary ext4 boot >> >> >>> 2 525MB 10.7GB 10.2GB primary lvm >> >> >>> >> >> >>> Than i want to create snap, so i do: >> >> >>> root@test:~# rbd snap create >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >> >>> >> >> >>> And now i want to map it: >> >> >>> >> >> >>> root@test:~# rbd map >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >> >> >>> /dev/rbd1 >> >> >>> root@test:~# parted /dev/rbd1 print >> >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >> >> >>> system). >> >> >>> /dev/rbd1 has been opened read-only. >> >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >> >> >>> system). >> >> >>> /dev/rbd1 has been opened read-only. >> >> >>> Error: /dev/rbd1: unrecognised disk label >> >> >>> >> >> >>> Even md5 different... >> >> >>> root@ix-s2:~# md5sum /dev/rbd0 >> >> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 >> >> >>> root@ix-s2:~# md5sum /dev/rbd1 >> >> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 >> >> >>> >> >> >>> >
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Also, what do you mean by "change journal side"? -Sam On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just wrote: > Not sure what you mean by: > > but it's stop to work in same moment, when cache layer fulfilled with > data and evict/flush started... > -Sam > > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > wrote: >> No, when we start draining cache - bad pgs was in place... >> We have big rebalance (disk by disk - to change journal side on both >> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 >> pgs inconsistent... >> >> In writeback - yes, looks like snapshot works good. but it's stop to work in >> same moment, when cache layer fulfilled with data and evict/flush started... >> >> >> >> 2015-08-21 2:09 GMT+03:00 Samuel Just : >>> >>> So you started draining the cache pool before you saw either the >>> inconsistent pgs or the anomalous snap behavior? (That is, writeback >>> mode was working correctly?) >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >>> wrote: >>> > Good joke ) >>> > >>> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >>> >> >>> >> Certainly, don't reproduce this with a cluster you care about :). >>> >> -Sam >>> >> >>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just wrote: >>> >> > What's supposed to happen is that the client transparently directs >>> >> > all >>> >> > requests to the cache pool rather than the cold pool when there is a >>> >> > cache pool. If the kernel is sending requests to the cold pool, >>> >> > that's probably where the bug is. Odd. It could also be a bug >>> >> > specific 'forward' mode either in the client or on the osd. Why did >>> >> > you have it in that mode? >>> >> > -Sam >>> >> > >>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >>> >> > wrote: >>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >>> >> >> production, >>> >> >> and they don;t support ncq_trim... >>> >> >> >>> >> >> And 4,x first branch which include exceptions for this in libsata.c. >>> >> >> >>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to >>> >> >> go >>> >> >> deeper if packege for new kernel exist. >>> >> >> >>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >>> >> >> : >>> >> >>> >>> >> >>> root@test:~# uname -a >>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 >>> >> >>> 17:37:22 >>> >> >>> UTC >>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >>> >> >>> >>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >>> >> >>> >> Also, can you include the kernel version? >>> >> -Sam >>> >> >>> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >>> >> wrote: >>> >> > Snapshotting with cache/tiering *is* supposed to work. Can you >>> >> > open a >>> >> > bug? >>> >> > -Sam >>> >> > >>> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >>> >> > wrote: >>> >> >> This was related to the caching layer, which doesnt support >>> >> >> snapshooting per >>> >> >> docs...for sake of closing the thread. >>> >> >> >>> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >>> >> >> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hi all, can you please help me with unexplained situation... >>> >> >>> >>> >> >>> All snapshot inside ceph broken... >>> >> >>> >>> >> >>> So, as example, we have VM template, as rbd inside ceph. >>> >> >>> We can map it and mount to check that all ok with it >>> >> >>> >>> >> >>> root@test:~# rbd map >>> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >>> >> >>> /dev/rbd0 >>> >> >>> root@test:~# parted /dev/rbd0 print >>> >> >>> Model: Unknown (unknown) >>> >> >>> Disk /dev/rbd0: 10.7GB >>> >> >>> Sector size (logical/physical): 512B/512B >>> >> >>> Partition Table: msdos >>> >> >>> >>> >> >>> Number Start End SizeType File system Flags >>> >> >>> 1 1049kB 525MB 524MB primary ext4 boot >>> >> >>> 2 525MB 10.7GB 10.2GB primary lvm >>> >> >>> >>> >> >>> Than i want to create snap, so i do: >>> >> >>> root@test:~# rbd snap create >>> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >> >>> >>> >> >>> And now i want to map it: >>> >> >>> >>> >> >>> root@test:~# rbd map >>> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap >>> >> >>> /dev/rbd1 >>> >> >>> root@test:~# parted /dev/rbd1 print >>> >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >>> >> >>> system). >>> >> >>> /dev/rbd1 has been opened read-only. >>> >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file >>> >> >>> system). >>> >> >>> /dev/rbd1 has been opened read-only. >>> >> >>> Error: /dev/rbd1: unrecognised disk label >>> >> >>> >>> >> >>> Even md5 different... >
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just : > Not sure what you mean by: > > but it's stop to work in same moment, when cache layer fulfilled with > data and evict/flush started... > -Sam > > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > wrote: > > No, when we start draining cache - bad pgs was in place... > > We have big rebalance (disk by disk - to change journal side on both > > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors > and 2 > > pgs inconsistent... > > > > In writeback - yes, looks like snapshot works good. but it's stop to > work in > > same moment, when cache layer fulfilled with data and evict/flush > started... > > > > > > > > 2015-08-21 2:09 GMT+03:00 Samuel Just : > >> > >> So you started draining the cache pool before you saw either the > >> inconsistent pgs or the anomalous snap behavior? (That is, writeback > >> mode was working correctly?) > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >> wrote: > >> > Good joke ) > >> > > >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : > >> >> > >> >> Certainly, don't reproduce this with a cluster you care about :). > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just > wrote: > >> >> > What's supposed to happen is that the client transparently directs > >> >> > all > >> >> > requests to the cache pool rather than the cold pool when there is > a > >> >> > cache pool. If the kernel is sending requests to the cold pool, > >> >> > that's probably where the bug is. Odd. It could also be a bug > >> >> > specific 'forward' mode either in the client or on the osd. Why > did > >> >> > you have it in that mode? > >> >> > -Sam > >> >> > > >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >> >> > wrote: > >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in > >> >> >> production, > >> >> >> and they don;t support ncq_trim... > >> >> >> > >> >> >> And 4,x first branch which include exceptions for this in > libsata.c. > >> >> >> > >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no > to > >> >> >> go > >> >> >> deeper if packege for new kernel exist. > >> >> >> > >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >> >> >> : > >> >> >>> > >> >> >>> root@test:~# uname -a > >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 > >> >> >>> 17:37:22 > >> >> >>> UTC > >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >> >> >>> > >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : > >> >> > >> >> Also, can you include the kernel version? > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just > >> >> wrote: > >> >> > Snapshotting with cache/tiering *is* supposed to work. Can > you > >> >> > open a > >> >> > bug? > >> >> > -Sam > >> >> > > >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >> >> > wrote: > >> >> >> This was related to the caching layer, which doesnt support > >> >> >> snapshooting per > >> >> >> docs...for sake of closing the thread. > >> >> >> > >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >> >> >> > >> >> >> wrote: > >> >> >>> > >> >> >>> Hi all, can you please help me with unexplained situation... > >> >> >>> > >> >> >>> All snapshot inside ceph broken... > >> >> >>> > >> >> >>> So, as example, we have VM template, as rbd inside ceph. > >> >> >>> We can map it and mount to check that all ok with it > >> >> >>> > >> >> >>> root@test:~# rbd map > >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >> >> >>> /dev/rbd0 > >> >> >>> root@test:~# parted /dev/rbd0 print > >> >> >>> Model: Unknown (unknown) > >> >> >>> Disk /dev/rbd0: 10.7GB > >> >> >>> Sector size (logical/physical): 512B/512B > >> >> >>> Partition Table: msdos > >> >> >>> > >> >> >>> Number Start End SizeType File system Flags > >> >> >>> 1 1049kB 525MB 524MB primary ext4 boot > >> >> >>> 2 525MB 10.7GB 10.2GB primary lvm > >> >> >>> > >> >> >>> Than i want to create snap, so i do: > >> >> >>> root@test:~# rbd snap create > >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >> >>> > >> >> >>> And now i want to map it: > >> >> >>> > >> >> >>> root@test:~# rbd map > >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >> >> >>> /dev/rbd1 > >> >> >>> root@test:~# parted /dev/rbd1
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor wrote: > WE haven't set values for max_bytes / max_objects.. and all data initially > writes only to cache layer and not flushed at all to cold layer. > > Then we received notification from monitoring that we collect about 750GB in > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk > size... And then evicting/flushing started... > > And issue with snapshots arrived > > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >> Not sure what you mean by: >> >> but it's stop to work in same moment, when cache layer fulfilled with >> data and evict/flush started... >> -Sam >> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> wrote: >> > No, when we start draining cache - bad pgs was in place... >> > We have big rebalance (disk by disk - to change journal side on both >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors >> > and 2 >> > pgs inconsistent... >> > >> > In writeback - yes, looks like snapshot works good. but it's stop to >> > work in >> > same moment, when cache layer fulfilled with data and evict/flush >> > started... >> > >> > >> > >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> >> >> So you started draining the cache pool before you saw either the >> >> inconsistent pgs or the anomalous snap behavior? (That is, writeback >> >> mode was working correctly?) >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> >> wrote: >> >> > Good joke ) >> >> > >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> >> >> >> >> Certainly, don't reproduce this with a cluster you care about :). >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >> >> >> wrote: >> >> >> > What's supposed to happen is that the client transparently directs >> >> >> > all >> >> >> > requests to the cache pool rather than the cold pool when there is >> >> >> > a >> >> >> > cache pool. If the kernel is sending requests to the cold pool, >> >> >> > that's probably where the bug is. Odd. It could also be a bug >> >> >> > specific 'forward' mode either in the client or on the osd. Why >> >> >> > did >> >> >> > you have it in that mode? >> >> >> > -Sam >> >> >> > >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >> >> > wrote: >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >> >> >> >> production, >> >> >> >> and they don;t support ncq_trim... >> >> >> >> >> >> >> >> And 4,x first branch which include exceptions for this in >> >> >> >> libsata.c. >> >> >> >> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no >> >> >> >> to >> >> >> >> go >> >> >> >> deeper if packege for new kernel exist. >> >> >> >> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >> >> >> >> : >> >> >> >>> >> >> >> >>> root@test:~# uname -a >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 >> >> >> >>> 17:37:22 >> >> >> >>> UTC >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> >> >>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> >> >> >> >> Also, can you include the kernel version? >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >> >> >> wrote: >> >> >> > Snapshotting with cache/tiering *is* supposed to work. Can >> >> >> > you >> >> >> > open a >> >> >> > bug? >> >> >> > -Sam >> >> >> > >> >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> >> >> > wrote: >> >> >> >> This was related to the caching layer, which doesnt support >> >> >> >> snapshooting per >> >> >> >> docs...for sake of closing the thread. >> >> >> >> >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> >> >> >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> Hi all, can you please help me with unexplained >> >> >> >>> situation... >> >> >> >>> >> >> >> >>> All snapshot inside ceph broken... >> >> >> >>> >> >> >> >>> So, as example, we have VM template, as rbd inside ceph. >> >> >> >>> We can map it and mount to check that all ok with it >> >> >> >>> >> >> >> >>> root@test:~# rbd map >> >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 >> >> >> >>> /dev/rbd0 >> >> >> >>> root@test:~# parted /dev/rbd0 print >> >> >> >>> Model: Unknown (unknown) >> >> >> >>> Disk /dev/rbd0: 10.7GB >> >> >> >>> Sector size (logical/physical): 512B/512B >> >> >> >>> Partition Table: msdos >> >> >> >>> >> >> >> >>> Number Start End SizeType File system Flags >> >> >> >>> 1 1049kB 525MB 524MB primary ext4 boot >> >> >> >>> 2 525MB 10.7GB 10.2GB primary lvm >> >> >> >>> >> >> >> >>> Than i want to create snap, so i do: >> >> >> >>> root@test:~# rbd snap create >> >> >> >>>
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Our initial values for journal sizes was enough, but flush time was 5 secs, so we increase journal side to fit flush timeframe min|max for 29/30 seconds. I mean filestore max sync interval = 30 filestore min sync interval = 29 when said flush time 2015-08-21 2:16 GMT+03:00 Samuel Just : > Also, what do you mean by "change journal side"? > -Sam > > On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just wrote: > > Not sure what you mean by: > > > > but it's stop to work in same moment, when cache layer fulfilled with > > data and evict/flush started... > > -Sam > > > > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > > wrote: > >> No, when we start draining cache - bad pgs was in place... > >> We have big rebalance (disk by disk - to change journal side on both > >> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors > and 2 > >> pgs inconsistent... > >> > >> In writeback - yes, looks like snapshot works good. but it's stop to > work in > >> same moment, when cache layer fulfilled with data and evict/flush > started... > >> > >> > >> > >> 2015-08-21 2:09 GMT+03:00 Samuel Just : > >>> > >>> So you started draining the cache pool before you saw either the > >>> inconsistent pgs or the anomalous snap behavior? (That is, writeback > >>> mode was working correctly?) > >>> -Sam > >>> > >>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >>> wrote: > >>> > Good joke ) > >>> > > >>> > 2015-08-21 2:06 GMT+03:00 Samuel Just : > >>> >> > >>> >> Certainly, don't reproduce this with a cluster you care about :). > >>> >> -Sam > >>> >> > >>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just > wrote: > >>> >> > What's supposed to happen is that the client transparently directs > >>> >> > all > >>> >> > requests to the cache pool rather than the cold pool when there > is a > >>> >> > cache pool. If the kernel is sending requests to the cold pool, > >>> >> > that's probably where the bug is. Odd. It could also be a bug > >>> >> > specific 'forward' mode either in the client or on the osd. Why > did > >>> >> > you have it in that mode? > >>> >> > -Sam > >>> >> > > >>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >>> >> > wrote: > >>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in > >>> >> >> production, > >>> >> >> and they don;t support ncq_trim... > >>> >> >> > >>> >> >> And 4,x first branch which include exceptions for this in > libsata.c. > >>> >> >> > >>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no > to > >>> >> >> go > >>> >> >> deeper if packege for new kernel exist. > >>> >> >> > >>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >>> >> >> : > >>> >> >>> > >>> >> >>> root@test:~# uname -a > >>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 > >>> >> >>> 17:37:22 > >>> >> >>> UTC > >>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >>> >> >>> > >>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : > >>> >> > >>> >> Also, can you include the kernel version? > >>> >> -Sam > >>> >> > >>> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just > > >>> >> wrote: > >>> >> > Snapshotting with cache/tiering *is* supposed to work. Can > you > >>> >> > open a > >>> >> > bug? > >>> >> > -Sam > >>> >> > > >>> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >>> >> > wrote: > >>> >> >> This was related to the caching layer, which doesnt support > >>> >> >> snapshooting per > >>> >> >> docs...for sake of closing the thread. > >>> >> >> > >>> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >>> >> >> > >>> >> >> wrote: > >>> >> >>> > >>> >> >>> Hi all, can you please help me with unexplained > situation... > >>> >> >>> > >>> >> >>> All snapshot inside ceph broken... > >>> >> >>> > >>> >> >>> So, as example, we have VM template, as rbd inside ceph. > >>> >> >>> We can map it and mount to check that all ok with it > >>> >> >>> > >>> >> >>> root@test:~# rbd map > >>> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> >> >>> /dev/rbd0 > >>> >> >>> root@test:~# parted /dev/rbd0 print > >>> >> >>> Model: Unknown (unknown) > >>> >> >>> Disk /dev/rbd0: 10.7GB > >>> >> >>> Sector size (logical/physical): 512B/512B > >>> >> >>> Partition Table: msdos > >>> >> >>> > >>> >> >>> Number Start End SizeType File system Flags > >>> >> >>> 1 1049kB 525MB 524MB primary ext4 boot > >>> >> >>> 2 525MB 10.7GB 10.2GB primary lvm > >>> >> >>> > >>> >> >>> Than i want to create snap, so i do: > >>> >> >>> root@test:~# rbd snap create > >>> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >> >>> > >>> >> >>> And now i want to map it: > >>> >> >>> > >>> >> >>> root@test:~# rbd map > >>> >> >>> cold-s
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just : > But that was still in writeback mode, right? > -Sam > > On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor > wrote: > > WE haven't set values for max_bytes / max_objects.. and all data > initially > > writes only to cache layer and not flushed at all to cold layer. > > > > Then we received notification from monitoring that we collect about > 750GB in > > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk > > size... And then evicting/flushing started... > > > > And issue with snapshots arrived > > > > 2015-08-21 2:15 GMT+03:00 Samuel Just : > >> > >> Not sure what you mean by: > >> > >> but it's stop to work in same moment, when cache layer fulfilled with > >> data and evict/flush started... > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > >> wrote: > >> > No, when we start draining cache - bad pgs was in place... > >> > We have big rebalance (disk by disk - to change journal side on both > >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors > >> > and 2 > >> > pgs inconsistent... > >> > > >> > In writeback - yes, looks like snapshot works good. but it's stop to > >> > work in > >> > same moment, when cache layer fulfilled with data and evict/flush > >> > started... > >> > > >> > > >> > > >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : > >> >> > >> >> So you started draining the cache pool before you saw either the > >> >> inconsistent pgs or the anomalous snap behavior? (That is, writeback > >> >> mode was working correctly?) > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >> >> wrote: > >> >> > Good joke ) > >> >> > > >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : > >> >> >> > >> >> >> Certainly, don't reproduce this with a cluster you care about :). > >> >> >> -Sam > >> >> >> > >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just > >> >> >> wrote: > >> >> >> > What's supposed to happen is that the client transparently > directs > >> >> >> > all > >> >> >> > requests to the cache pool rather than the cold pool when there > is > >> >> >> > a > >> >> >> > cache pool. If the kernel is sending requests to the cold pool, > >> >> >> > that's probably where the bug is. Odd. It could also be a bug > >> >> >> > specific 'forward' mode either in the client or on the osd. Why > >> >> >> > did > >> >> >> > you have it in that mode? > >> >> >> > -Sam > >> >> >> > > >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >> >> >> > wrote: > >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in > >> >> >> >> production, > >> >> >> >> and they don;t support ncq_trim... > >> >> >> >> > >> >> >> >> And 4,x first branch which include exceptions for this in > >> >> >> >> libsata.c. > >> >> >> >> > >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer > no > >> >> >> >> to > >> >> >> >> go > >> >> >> >> deeper if packege for new kernel exist. > >> >> >> >> > >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >> >> >> >> : > >> >> >> >>> > >> >> >> >>> root@test:~# uname -a > >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 > >> >> >> >>> 17:37:22 > >> >> >> >>> UTC > >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >> >> >> >>> > >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : > >> >> >> > >> >> >> Also, can you include the kernel version? > >> >> >> -Sam > >> >> >> > >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just < > sj...@redhat.com> > >> >> >> wrote: > >> >> >> > Snapshotting with cache/tiering *is* supposed to work. Can > >> >> >> > you > >> >> >> > open a > >> >> >> > bug? > >> >> >> > -Sam > >> >> >> > > >> >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >> >> >> > wrote: > >> >> >> >> This was related to the caching layer, which doesnt > support > >> >> >> >> snapshooting per > >> >> >> >> docs...for sake of closing the thread. > >> >> >> >> > >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >> >> >> >> > >> >> >> >> wrote: > >> >> >> >>> > >> >> >> >>> Hi all, can you please help me with unexplained > >> >> >> >>> situation... > >> >> >> >>> > >> >> >> >>> All snapshot inside ceph broken... > >> >> >> >>> > >> >> >> >>> So, as example, we have VM template, as rbd inside ceph. > >> >> >> >>> We can map it and mount to check that all ok with it > >> >> >> >>> > >> >> >> >>> root@test:~# rbd map > >> >> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >> >> >> >>> /dev/rbd0 > >> >> >> >>> root@test:~# parted /dev/rbd0 print > >> >> >> >>> Model: Unknown (unknown) > >> >> >> >>> Disk /dev/rbd0: 10.7GB > >> >> >> >>> Sector size (logical/physical): 512B/512B > >> >> >> >>> Partition Table: msdos > >> >> >> >>> > >> >>
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor wrote: > Right. But issues started... > > 2015-08-21 2:20 GMT+03:00 Samuel Just : >> >> But that was still in writeback mode, right? >> -Sam >> >> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >> wrote: >> > WE haven't set values for max_bytes / max_objects.. and all data >> > initially >> > writes only to cache layer and not flushed at all to cold layer. >> > >> > Then we received notification from monitoring that we collect about >> > 750GB in >> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk >> > size... And then evicting/flushing started... >> > >> > And issue with snapshots arrived >> > >> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >> >> >> Not sure what you mean by: >> >> >> >> but it's stop to work in same moment, when cache layer fulfilled with >> >> data and evict/flush started... >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> >> wrote: >> >> > No, when we start draining cache - bad pgs was in place... >> >> > We have big rebalance (disk by disk - to change journal side on both >> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub >> >> > errors >> >> > and 2 >> >> > pgs inconsistent... >> >> > >> >> > In writeback - yes, looks like snapshot works good. but it's stop to >> >> > work in >> >> > same moment, when cache layer fulfilled with data and evict/flush >> >> > started... >> >> > >> >> > >> >> > >> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> >> >> >> >> So you started draining the cache pool before you saw either the >> >> >> inconsistent pgs or the anomalous snap behavior? (That is, >> >> >> writeback >> >> >> mode was working correctly?) >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> >> >> wrote: >> >> >> > Good joke ) >> >> >> > >> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Certainly, don't reproduce this with a cluster you care about :). >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >> >> >> >> wrote: >> >> >> >> > What's supposed to happen is that the client transparently >> >> >> >> > directs >> >> >> >> > all >> >> >> >> > requests to the cache pool rather than the cold pool when there >> >> >> >> > is >> >> >> >> > a >> >> >> >> > cache pool. If the kernel is sending requests to the cold >> >> >> >> > pool, >> >> >> >> > that's probably where the bug is. Odd. It could also be a bug >> >> >> >> > specific 'forward' mode either in the client or on the osd. >> >> >> >> > Why >> >> >> >> > did >> >> >> >> > you have it in that mode? >> >> >> >> > -Sam >> >> >> >> > >> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >> >> >> > wrote: >> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >> >> >> >> >> production, >> >> >> >> >> and they don;t support ncq_trim... >> >> >> >> >> >> >> >> >> >> And 4,x first branch which include exceptions for this in >> >> >> >> >> libsata.c. >> >> >> >> >> >> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer >> >> >> >> >> no >> >> >> >> >> to >> >> >> >> >> go >> >> >> >> >> deeper if packege for new kernel exist. >> >> >> >> >> >> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >> >> >> >> >> : >> >> >> >> >>> >> >> >> >> >>> root@test:~# uname -a >> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 >> >> >> >> >>> 17:37:22 >> >> >> >> >>> UTC >> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> >> >> >>> >> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Also, can you include the kernel version? >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >> >> >> >> >> >> >> >> wrote: >> >> >> >> > Snapshotting with cache/tiering *is* supposed to work. >> >> >> >> > Can >> >> >> >> > you >> >> >> >> > open a >> >> >> >> > bug? >> >> >> >> > -Sam >> >> >> >> > >> >> >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> >> >> >> > wrote: >> >> >> >> >> This was related to the caching layer, which doesnt >> >> >> >> >> support >> >> >> >> >> snapshooting per >> >> >> >> >> docs...for sake of closing the thread. >> >> >> >> >> >> >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> >> >> >> >> >> >> >> >> wrote: >> >> >> >> >>> >> >> >> >> >>> Hi all, can you please help me with unexplained >> >> >> >> >>> situation... >> >> >> >> >>> >> >> >> >> >>> All snapshot inside ceph broken... >> >> >> >> >>> >> >> >> >> >>> So, as example, we have VM template, as rbd inside ceph. >> >> >> >> >>> We can map it and mount to check that all ok with it >> >> >> >> >>> >> >> >> >> >>> root@test:~
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just wrote: > Yeah, I'm trying to confirm that the issues did happen in writeback mode. > -Sam > > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor > wrote: >> Right. But issues started... >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : >>> >>> But that was still in writeback mode, right? >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >>> wrote: >>> > WE haven't set values for max_bytes / max_objects.. and all data >>> > initially >>> > writes only to cache layer and not flushed at all to cold layer. >>> > >>> > Then we received notification from monitoring that we collect about >>> > 750GB in >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk >>> > size... And then evicting/flushing started... >>> > >>> > And issue with snapshots arrived >>> > >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >>> >> >>> >> Not sure what you mean by: >>> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled with >>> >> data and evict/flush started... >>> >> -Sam >>> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >>> >> wrote: >>> >> > No, when we start draining cache - bad pgs was in place... >>> >> > We have big rebalance (disk by disk - to change journal side on both >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub >>> >> > errors >>> >> > and 2 >>> >> > pgs inconsistent... >>> >> > >>> >> > In writeback - yes, looks like snapshot works good. but it's stop to >>> >> > work in >>> >> > same moment, when cache layer fulfilled with data and evict/flush >>> >> > started... >>> >> > >>> >> > >>> >> > >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >>> >> >> >>> >> >> So you started draining the cache pool before you saw either the >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, >>> >> >> writeback >>> >> >> mode was working correctly?) >>> >> >> -Sam >>> >> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >>> >> >> wrote: >>> >> >> > Good joke ) >>> >> >> > >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >>> >> >> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care about :). >>> >> >> >> -Sam >>> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >>> >> >> >> wrote: >>> >> >> >> > What's supposed to happen is that the client transparently >>> >> >> >> > directs >>> >> >> >> > all >>> >> >> >> > requests to the cache pool rather than the cold pool when there >>> >> >> >> > is >>> >> >> >> > a >>> >> >> >> > cache pool. If the kernel is sending requests to the cold >>> >> >> >> > pool, >>> >> >> >> > that's probably where the bug is. Odd. It could also be a bug >>> >> >> >> > specific 'forward' mode either in the client or on the osd. >>> >> >> >> > Why >>> >> >> >> > did >>> >> >> >> > you have it in that mode? >>> >> >> >> > -Sam >>> >> >> >> > >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >>> >> >> >> > wrote: >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >>> >> >> >> >> production, >>> >> >> >> >> and they don;t support ncq_trim... >>> >> >> >> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this in >>> >> >> >> >> libsata.c. >>> >> >> >> >> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer >>> >> >> >> >> no >>> >> >> >> >> to >>> >> >> >> >> go >>> >> >> >> >> deeper if packege for new kernel exist. >>> >> >> >> >> >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >>> >> >> >> >> : >>> >> >> >> >>> >>> >> >> >> >>> root@test:~# uname -a >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 >>> >> >> >> >>> 17:37:22 >>> >> >> >> >>> UTC >>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >>> >> >> >> >>> >>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >>> >> >> >> >>> >> >> >> Also, can you include the kernel version? >>> >> >> >> -Sam >>> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just >>> >> >> >> >>> >> >> >> wrote: >>> >> >> >> > Snapshotting with cache/tiering *is* supposed to work. >>> >> >> >> > Can >>> >> >> >> > you >>> >> >> >> > open a >>> >> >> >> > bug? >>> >> >> >> > -Sam >>> >> >> >> > >>> >> >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >>> >> >> >> > wrote: >>> >> >> >> >> This was related to the caching layer, which doesnt >>> >> >> >> >> support >>> >> >> >> >> snapshooting per >>> >> >> >> >> docs...for sake of closing the thread. >>> >> >> >> >> >>> >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >>> >> >> >> >> >>> >> >> >> >> wrote: >>> >> >> >> >>> >>> >> >> >> >>> Hi all, can you please help me with unexplained >>> >> >> >>
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
I mean in forward mode - it;s permanent problem - snapshots not working. And for writeback mode after we change max_bytes/object values, it;s around 30 by 70... 70% of time it;s works... 30% - not. Looks like for old images - snapshots works fine (images which already exists before we change values). For any new images - no 2015-08-21 2:21 GMT+03:00 Voloshanenko Igor : > Right. But issues started... > > 2015-08-21 2:20 GMT+03:00 Samuel Just : > >> But that was still in writeback mode, right? >> -Sam >> >> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >> wrote: >> > WE haven't set values for max_bytes / max_objects.. and all data >> initially >> > writes only to cache layer and not flushed at all to cold layer. >> > >> > Then we received notification from monitoring that we collect about >> 750GB in >> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk >> > size... And then evicting/flushing started... >> > >> > And issue with snapshots arrived >> > >> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >> >> >> Not sure what you mean by: >> >> >> >> but it's stop to work in same moment, when cache layer fulfilled with >> >> data and evict/flush started... >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> >> wrote: >> >> > No, when we start draining cache - bad pgs was in place... >> >> > We have big rebalance (disk by disk - to change journal side on both >> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub >> errors >> >> > and 2 >> >> > pgs inconsistent... >> >> > >> >> > In writeback - yes, looks like snapshot works good. but it's stop to >> >> > work in >> >> > same moment, when cache layer fulfilled with data and evict/flush >> >> > started... >> >> > >> >> > >> >> > >> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> >> >> >> >> So you started draining the cache pool before you saw either the >> >> >> inconsistent pgs or the anomalous snap behavior? (That is, >> writeback >> >> >> mode was working correctly?) >> >> >> -Sam >> >> >> >> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> >> >> wrote: >> >> >> > Good joke ) >> >> >> > >> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Certainly, don't reproduce this with a cluster you care about :). >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >> >> >> >> wrote: >> >> >> >> > What's supposed to happen is that the client transparently >> directs >> >> >> >> > all >> >> >> >> > requests to the cache pool rather than the cold pool when >> there is >> >> >> >> > a >> >> >> >> > cache pool. If the kernel is sending requests to the cold >> pool, >> >> >> >> > that's probably where the bug is. Odd. It could also be a bug >> >> >> >> > specific 'forward' mode either in the client or on the osd. >> Why >> >> >> >> > did >> >> >> >> > you have it in that mode? >> >> >> >> > -Sam >> >> >> >> > >> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >> >> >> > wrote: >> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in >> >> >> >> >> production, >> >> >> >> >> and they don;t support ncq_trim... >> >> >> >> >> >> >> >> >> >> And 4,x first branch which include exceptions for this in >> >> >> >> >> libsata.c. >> >> >> >> >> >> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer >> no >> >> >> >> >> to >> >> >> >> >> go >> >> >> >> >> deeper if packege for new kernel exist. >> >> >> >> >> >> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >> >> >> >> >> : >> >> >> >> >>> >> >> >> >> >>> root@test:~# uname -a >> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 >> >> >> >> >>> 17:37:22 >> >> >> >> >>> UTC >> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> >> >> >>> >> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >> >> >> >> >> >> >> Also, can you include the kernel version? >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just < >> sj...@redhat.com> >> >> >> >> wrote: >> >> >> >> > Snapshotting with cache/tiering *is* supposed to work. >> Can >> >> >> >> > you >> >> >> >> > open a >> >> >> >> > bug? >> >> >> >> > -Sam >> >> >> >> > >> >> >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic >> >> >> >> > wrote: >> >> >> >> >> This was related to the caching layer, which doesnt >> support >> >> >> >> >> snapshooting per >> >> >> >> >> docs...for sake of closing the thread. >> >> >> >> >> >> >> >> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor >> >> >> >> >> >> >> >> >> >> wrote: >> >> >> >> >>> >> >> >> >> >>> Hi all, can you please help me with unexplained >> >> >> >> >>> situation... >> >> >> >> >>> >> >> >> >> >>> All snapshot inside ceph broken... >> >> >> >> >>> >> >> >> >> >>> So, as example, we have VM template, as rbd insid
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just : > Specifically, the snap behavior (we already know that the pgs went > inconsistent while the pool was in writeback mode, right?). > -Sam > > On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just wrote: > > Yeah, I'm trying to confirm that the issues did happen in writeback mode. > > -Sam > > > > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor > > wrote: > >> Right. But issues started... > >> > >> 2015-08-21 2:20 GMT+03:00 Samuel Just : > >>> > >>> But that was still in writeback mode, right? > >>> -Sam > >>> > >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor > >>> wrote: > >>> > WE haven't set values for max_bytes / max_objects.. and all data > >>> > initially > >>> > writes only to cache layer and not flushed at all to cold layer. > >>> > > >>> > Then we received notification from monitoring that we collect about > >>> > 750GB in > >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk > >>> > size... And then evicting/flushing started... > >>> > > >>> > And issue with snapshots arrived > >>> > > >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : > >>> >> > >>> >> Not sure what you mean by: > >>> >> > >>> >> but it's stop to work in same moment, when cache layer fulfilled > with > >>> >> data and evict/flush started... > >>> >> -Sam > >>> >> > >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > >>> >> wrote: > >>> >> > No, when we start draining cache - bad pgs was in place... > >>> >> > We have big rebalance (disk by disk - to change journal side on > both > >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub > >>> >> > errors > >>> >> > and 2 > >>> >> > pgs inconsistent... > >>> >> > > >>> >> > In writeback - yes, looks like snapshot works good. but it's stop > to > >>> >> > work in > >>> >> > same moment, when cache layer fulfilled with data and evict/flush > >>> >> > started... > >>> >> > > >>> >> > > >>> >> > > >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : > >>> >> >> > >>> >> >> So you started draining the cache pool before you saw either the > >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, > >>> >> >> writeback > >>> >> >> mode was working correctly?) > >>> >> >> -Sam > >>> >> >> > >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >>> >> >> wrote: > >>> >> >> > Good joke ) > >>> >> >> > > >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : > >>> >> >> >> > >>> >> >> >> Certainly, don't reproduce this with a cluster you care about > :). > >>> >> >> >> -Sam > >>> >> >> >> > >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just < > sj...@redhat.com> > >>> >> >> >> wrote: > >>> >> >> >> > What's supposed to happen is that the client transparently > >>> >> >> >> > directs > >>> >> >> >> > all > >>> >> >> >> > requests to the cache pool rather than the cold pool when > there > >>> >> >> >> > is > >>> >> >> >> > a > >>> >> >> >> > cache pool. If the kernel is sending requests to the cold > >>> >> >> >> > pool, > >>> >> >> >> > that's probably where the bug is. Odd. It could also be a > bug > >>> >> >> >> > specific 'forward' mode either in the client or on the osd. > >>> >> >> >> > Why > >>> >> >> >> > did > >>> >> >> >> > you have it in that mode? > >>> >> >> >> > -Sam > >>> >> >> >> > > >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >>> >> >> >> > wrote: > >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro > in > >>> >> >> >> >> production, > >>> >> >> >> >> and they don;t support ncq_trim... > >>> >> >> >> >> > >>> >> >> >> >> And 4,x first branch which include exceptions for this in > >>> >> >> >> >> libsata.c. > >>> >> >> >> >> > >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we > prefer > >>> >> >> >> >> no > >>> >> >> >> >> to > >>> >> >> >> >> go > >>> >> >> >> >> deeper if packege for new kernel exist. > >>> >> >> >> >> > >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >>> >> >> >> >> : > >>> >> >> >> >>> > >>> >> >> >> >>> root@test:~# uname -a > >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun > May 17 > >>> >> >> >> >>> 17:37:22 > >>> >> >> >> >>> UTC > >>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >>> >> >> >> >>> > >>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : > >>> >> >> >> > >>> >> >> >> Also, can you include the kernel version? > >>> >> >> >> -Sam > >>> >> >> >> > >>> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just > >>> >> >> >> > >>> >> >> >> wrote: > >>> >> >> >> > Snapshotting with cache/tiering *is* supposed to work. > >>> >> >> >> > Can > >>> >> >> >> > you > >>> >> >> >> > open a > >>> >> >> >> > bug? > >>> >> >> >> > -Sam > >>> >> >> >> > > >>> >> >> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >>> >> >> >> > wrote: > >>> >> >> >> >> This was relate
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: > And you adjusted the journals by removing the osd, recreating it with > a larger journal, and reinserting it? > -Sam > > On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor > > wrote: > > Right ( but also was rebalancing cycle 2 day before pgs corrupted) > > > > 2015-08-21 2:23 GMT+03:00 Samuel Just >: > >> > >> Specifically, the snap behavior (we already know that the pgs went > >> inconsistent while the pool was in writeback mode, right?). > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just > wrote: > >> > Yeah, I'm trying to confirm that the issues did happen in writeback > >> > mode. > >> > -Sam > >> > > >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor > >> > > wrote: > >> >> Right. But issues started... > >> >> > >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just >: > >> >>> > >> >>> But that was still in writeback mode, right? > >> >>> -Sam > >> >>> > >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor > >> >>> > wrote: > >> >>> > WE haven't set values for max_bytes / max_objects.. and all data > >> >>> > initially > >> >>> > writes only to cache layer and not flushed at all to cold layer. > >> >>> > > >> >>> > Then we received notification from monitoring that we collect > about > >> >>> > 750GB in > >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of > >> >>> > disk > >> >>> > size... And then evicting/flushing started... > >> >>> > > >> >>> > And issue with snapshots arrived > >> >>> > > >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just >: > >> >>> >> > >> >>> >> Not sure what you mean by: > >> >>> >> > >> >>> >> but it's stop to work in same moment, when cache layer fulfilled > >> >>> >> with > >> >>> >> data and evict/flush started... > >> >>> >> -Sam > >> >>> >> > >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > >> >>> >> > wrote: > >> >>> >> > No, when we start draining cache - bad pgs was in place... > >> >>> >> > We have big rebalance (disk by disk - to change journal side on > >> >>> >> > both > >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub > >> >>> >> > errors > >> >>> >> > and 2 > >> >>> >> > pgs inconsistent... > >> >>> >> > > >> >>> >> > In writeback - yes, looks like snapshot works good. but it's > stop > >> >>> >> > to > >> >>> >> > work in > >> >>> >> > same moment, when cache layer fulfilled with data and > evict/flush > >> >>> >> > started... > >> >>> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just >: > >> >>> >> >> > >> >>> >> >> So you started draining the cache pool before you saw either > the > >> >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, > >> >>> >> >> writeback > >> >>> >> >> mode was working correctly?) > >> >>> >> >> -Sam > >> >>> >> >> > >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >> >>> >> >> > wrote: > >> >>> >> >> > Good joke ) > >> >>> >> >> > > >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just >: > >> >>> >> >> >> > >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care > about > >> >>> >> >> >> :). > >> >>> >> >> >> -Sam > >> >>> >> >> >> > >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just > >> >>> >> >> >> > > >> >>> >> >> >> wrote: > >> >>> >> >> >> > What's supposed to happen is that the client > transparently > >> >>> >> >> >> > directs > >> >>> >> >> >> > all > >> >>> >> >> >> > requests to the cache pool rather than the cold pool when > >> >>> >> >> >> > there > >> >>> >> >> >> > is > >> >>> >> >> >> > a > >> >>> >> >> >> > cache pool. If the kernel is sending requests to the > cold > >> >>> >> >> >> > pool, > >> >>> >> >> >> > that's probably where the bug is. Odd. It could also > be a > >> >>> >> >> >> > bug > >> >>> >> >> >> > specific 'forward' mode either in the client or on the > osd. > >> >>> >> >> >> > Why > >> >>> >> >> >> > did > >> >>> >> >> >> > you have it in that mode? > >> >>> >> >> >> > -Sam > >> >>> >> >> >> > > >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >> >>> >> >> >> > > wrote: > >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 > pro > >> >>> >> >> >> >> in > >> >>> >> >> >> >> production, > >> >>> >> >> >> >> and they don;t support ncq_trim... > >> >>> >> >> >> >> > >> >>> >> >> >> >> And 4,x first branch which include exceptions for this > in > >> >>> >> >> >> >> libsata.c. > >> >>> >> >> >> >> > >> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we > >> >>> >> >> >> >> prefer > >> >>> >> >> >> >> no > >> >>> >> >> >> >> to > >> >>> >> >> >> >> go > >> >>> >> >> >> >> deeper if packege for new kernel exist. > >> >>> >> >> >> >> > >> >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >> >>> >> >> >> >> >: > >> >>> >> >> >> >>> > >> >>> >> >> >> >>> root@test:~# uname -a > >> >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun > >> >>> >> >> >> >>> May 17
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor wrote: > Right ( but also was rebalancing cycle 2 day before pgs corrupted) > > 2015-08-21 2:23 GMT+03:00 Samuel Just : >> >> Specifically, the snap behavior (we already know that the pgs went >> inconsistent while the pool was in writeback mode, right?). >> -Sam >> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just wrote: >> > Yeah, I'm trying to confirm that the issues did happen in writeback >> > mode. >> > -Sam >> > >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor >> > wrote: >> >> Right. But issues started... >> >> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : >> >>> >> >>> But that was still in writeback mode, right? >> >>> -Sam >> >>> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >> >>> wrote: >> >>> > WE haven't set values for max_bytes / max_objects.. and all data >> >>> > initially >> >>> > writes only to cache layer and not flushed at all to cold layer. >> >>> > >> >>> > Then we received notification from monitoring that we collect about >> >>> > 750GB in >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of >> >>> > disk >> >>> > size... And then evicting/flushing started... >> >>> > >> >>> > And issue with snapshots arrived >> >>> > >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >>> >> >> >>> >> Not sure what you mean by: >> >>> >> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled >> >>> >> with >> >>> >> data and evict/flush started... >> >>> >> -Sam >> >>> >> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> >>> >> wrote: >> >>> >> > No, when we start draining cache - bad pgs was in place... >> >>> >> > We have big rebalance (disk by disk - to change journal side on >> >>> >> > both >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub >> >>> >> > errors >> >>> >> > and 2 >> >>> >> > pgs inconsistent... >> >>> >> > >> >>> >> > In writeback - yes, looks like snapshot works good. but it's stop >> >>> >> > to >> >>> >> > work in >> >>> >> > same moment, when cache layer fulfilled with data and evict/flush >> >>> >> > started... >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >>> >> >> >> >>> >> >> So you started draining the cache pool before you saw either the >> >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, >> >>> >> >> writeback >> >>> >> >> mode was working correctly?) >> >>> >> >> -Sam >> >>> >> >> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> >>> >> >> wrote: >> >>> >> >> > Good joke ) >> >>> >> >> > >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >>> >> >> >> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care about >> >>> >> >> >> :). >> >>> >> >> >> -Sam >> >>> >> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >> >>> >> >> >> >> >>> >> >> >> wrote: >> >>> >> >> >> > What's supposed to happen is that the client transparently >> >>> >> >> >> > directs >> >>> >> >> >> > all >> >>> >> >> >> > requests to the cache pool rather than the cold pool when >> >>> >> >> >> > there >> >>> >> >> >> > is >> >>> >> >> >> > a >> >>> >> >> >> > cache pool. If the kernel is sending requests to the cold >> >>> >> >> >> > pool, >> >>> >> >> >> > that's probably where the bug is. Odd. It could also be a >> >>> >> >> >> > bug >> >>> >> >> >> > specific 'forward' mode either in the client or on the osd. >> >>> >> >> >> > Why >> >>> >> >> >> > did >> >>> >> >> >> > you have it in that mode? >> >>> >> >> >> > -Sam >> >>> >> >> >> > >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >>> >> >> >> > wrote: >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro >> >>> >> >> >> >> in >> >>> >> >> >> >> production, >> >>> >> >> >> >> and they don;t support ncq_trim... >> >>> >> >> >> >> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this in >> >>> >> >> >> >> libsata.c. >> >>> >> >> >> >> >> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we >> >>> >> >> >> >> prefer >> >>> >> >> >> >> no >> >>> >> >> >> >> to >> >>> >> >> >> >> go >> >>> >> >> >> >> deeper if packege for new kernel exist. >> >>> >> >> >> >> >> >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor >> >>> >> >> >> >> : >> >>> >> >> >> >>> >> >>> >> >> >> >>> root@test:~# uname -a >> >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun >> >>> >> >> >> >>> May 17 >> >>> >> >> >> >>> 17:37:22 >> >>> >> >> >> >>> UTC >> >>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux >> >>> >> >> >> >>> >> >>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just : >> >>> >> >> >> >> >>> >> >> >> Also, can you include the kernel version? >> >>> >> >> >> -Sam >> >>> >> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 3:51 PM
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor wrote: > Exactly > > пятница, 21 августа 2015 г. пользователь Samuel Just написал: > >> And you adjusted the journals by removing the osd, recreating it with >> a larger journal, and reinserting it? >> -Sam >> >> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor >> wrote: >> > Right ( but also was rebalancing cycle 2 day before pgs corrupted) >> > >> > 2015-08-21 2:23 GMT+03:00 Samuel Just : >> >> >> >> Specifically, the snap behavior (we already know that the pgs went >> >> inconsistent while the pool was in writeback mode, right?). >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just wrote: >> >> > Yeah, I'm trying to confirm that the issues did happen in writeback >> >> > mode. >> >> > -Sam >> >> > >> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor >> >> > wrote: >> >> >> Right. But issues started... >> >> >> >> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : >> >> >>> >> >> >>> But that was still in writeback mode, right? >> >> >>> -Sam >> >> >>> >> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >> >> >>> wrote: >> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data >> >> >>> > initially >> >> >>> > writes only to cache layer and not flushed at all to cold layer. >> >> >>> > >> >> >>> > Then we received notification from monitoring that we collect >> >> >>> > about >> >> >>> > 750GB in >> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of >> >> >>> > disk >> >> >>> > size... And then evicting/flushing started... >> >> >>> > >> >> >>> > And issue with snapshots arrived >> >> >>> > >> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >> >>> >> >> >> >>> >> Not sure what you mean by: >> >> >>> >> >> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled >> >> >>> >> with >> >> >>> >> data and evict/flush started... >> >> >>> >> -Sam >> >> >>> >> >> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> >> >>> >> wrote: >> >> >>> >> > No, when we start draining cache - bad pgs was in place... >> >> >>> >> > We have big rebalance (disk by disk - to change journal side >> >> >>> >> > on >> >> >>> >> > both >> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived >> >> >>> >> > scrub >> >> >>> >> > errors >> >> >>> >> > and 2 >> >> >>> >> > pgs inconsistent... >> >> >>> >> > >> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's >> >> >>> >> > stop >> >> >>> >> > to >> >> >>> >> > work in >> >> >>> >> > same moment, when cache layer fulfilled with data and >> >> >>> >> > evict/flush >> >> >>> >> > started... >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> >>> >> >> >> >> >>> >> >> So you started draining the cache pool before you saw either >> >> >>> >> >> the >> >> >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, >> >> >>> >> >> writeback >> >> >>> >> >> mode was working correctly?) >> >> >>> >> >> -Sam >> >> >>> >> >> >> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> >> >>> >> >> wrote: >> >> >>> >> >> > Good joke ) >> >> >>> >> >> > >> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> >>> >> >> >> >> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care >> >> >>> >> >> >> about >> >> >>> >> >> >> :). >> >> >>> >> >> >> -Sam >> >> >>> >> >> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >> >> >>> >> >> >> >> >> >>> >> >> >> wrote: >> >> >>> >> >> >> > What's supposed to happen is that the client >> >> >>> >> >> >> > transparently >> >> >>> >> >> >> > directs >> >> >>> >> >> >> > all >> >> >>> >> >> >> > requests to the cache pool rather than the cold pool >> >> >>> >> >> >> > when >> >> >>> >> >> >> > there >> >> >>> >> >> >> > is >> >> >>> >> >> >> > a >> >> >>> >> >> >> > cache pool. If the kernel is sending requests to the >> >> >>> >> >> >> > cold >> >> >>> >> >> >> > pool, >> >> >>> >> >> >> > that's probably where the bug is. Odd. It could also >> >> >>> >> >> >> > be a >> >> >>> >> >> >> > bug >> >> >>> >> >> >> > specific 'forward' mode either in the client or on the >> >> >>> >> >> >> > osd. >> >> >>> >> >> >> > Why >> >> >>> >> >> >> > did >> >> >>> >> >> >> > you have it in that mode? >> >> >>> >> >> >> > -Sam >> >> >>> >> >> >> > >> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >> >>> >> >> >> > wrote: >> >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 >> >> >>> >> >> >> >> pro >> >> >>> >> >> >> >> in >> >> >>> >> >> >> >> production, >> >> >>> >> >> >> >> and they don;t support ncq_trim... >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this >> >> >>> >> >> >> >> in >> >> >>> >> >> >> >> libsata.c. >> >> >>> >> >> >> >> >> >> >>>
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
As i we use journal collocation for journal now (because we want to utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal size on ceph.conf). I don;t prefer manual work)) So create very simple script to update journal size 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor : > Exactly > > пятница, 21 августа 2015 г. пользователь Samuel Just написал: > > And you adjusted the journals by removing the osd, recreating it with >> a larger journal, and reinserting it? >> -Sam >> >> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor >> wrote: >> > Right ( but also was rebalancing cycle 2 day before pgs corrupted) >> > >> > 2015-08-21 2:23 GMT+03:00 Samuel Just : >> >> >> >> Specifically, the snap behavior (we already know that the pgs went >> >> inconsistent while the pool was in writeback mode, right?). >> >> -Sam >> >> >> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just wrote: >> >> > Yeah, I'm trying to confirm that the issues did happen in writeback >> >> > mode. >> >> > -Sam >> >> > >> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor >> >> > wrote: >> >> >> Right. But issues started... >> >> >> >> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : >> >> >>> >> >> >>> But that was still in writeback mode, right? >> >> >>> -Sam >> >> >>> >> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >> >> >>> wrote: >> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data >> >> >>> > initially >> >> >>> > writes only to cache layer and not flushed at all to cold layer. >> >> >>> > >> >> >>> > Then we received notification from monitoring that we collect >> about >> >> >>> > 750GB in >> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of >> >> >>> > disk >> >> >>> > size... And then evicting/flushing started... >> >> >>> > >> >> >>> > And issue with snapshots arrived >> >> >>> > >> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >> >>> >> >> >> >>> >> Not sure what you mean by: >> >> >>> >> >> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled >> >> >>> >> with >> >> >>> >> data and evict/flush started... >> >> >>> >> -Sam >> >> >>> >> >> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> >> >>> >> wrote: >> >> >>> >> > No, when we start draining cache - bad pgs was in place... >> >> >>> >> > We have big rebalance (disk by disk - to change journal side >> on >> >> >>> >> > both >> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived >> scrub >> >> >>> >> > errors >> >> >>> >> > and 2 >> >> >>> >> > pgs inconsistent... >> >> >>> >> > >> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's >> stop >> >> >>> >> > to >> >> >>> >> > work in >> >> >>> >> > same moment, when cache layer fulfilled with data and >> evict/flush >> >> >>> >> > started... >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >> >> >>> >> >> >> >> >>> >> >> So you started draining the cache pool before you saw either >> the >> >> >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, >> >> >>> >> >> writeback >> >> >>> >> >> mode was working correctly?) >> >> >>> >> >> -Sam >> >> >>> >> >> >> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor >> >> >>> >> >> wrote: >> >> >>> >> >> > Good joke ) >> >> >>> >> >> > >> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just : >> >> >>> >> >> >> >> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care >> about >> >> >>> >> >> >> :). >> >> >>> >> >> >> -Sam >> >> >>> >> >> >> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just >> >> >>> >> >> >> >> >> >>> >> >> >> wrote: >> >> >>> >> >> >> > What's supposed to happen is that the client >> transparently >> >> >>> >> >> >> > directs >> >> >>> >> >> >> > all >> >> >>> >> >> >> > requests to the cache pool rather than the cold pool >> when >> >> >>> >> >> >> > there >> >> >>> >> >> >> > is >> >> >>> >> >> >> > a >> >> >>> >> >> >> > cache pool. If the kernel is sending requests to the >> cold >> >> >>> >> >> >> > pool, >> >> >>> >> >> >> > that's probably where the bug is. Odd. It could also >> be a >> >> >>> >> >> >> > bug >> >> >>> >> >> >> > specific 'forward' mode either in the client or on the >> osd. >> >> >>> >> >> >> > Why >> >> >>> >> >> >> > did >> >> >>> >> >> >> > you have it in that mode? >> >> >>> >> >> >> > -Sam >> >> >>> >> >> >> > >> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor >> >> >>> >> >> >> > wrote: >> >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 >> pro >> >> >>> >> >> >> >> in >> >> >>> >> >> >> >> production, >> >> >>> >> >> >> >> and they don;t support ncq_trim... >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this >> in >> >> >>> >> >> >> >> libsata.c. >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we >> >> >>> >> >> >> >> p
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Will do, Sam! thank in advance for you help! 2015-08-21 2:28 GMT+03:00 Samuel Just : > Ok, create a ticket with a timeline and all of this information, I'll > try to look into it more tomorrow. > -Sam > > On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor > wrote: > > Exactly > > > > пятница, 21 августа 2015 г. пользователь Samuel Just написал: > > > >> And you adjusted the journals by removing the osd, recreating it with > >> a larger journal, and reinserting it? > >> -Sam > >> > >> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor > >> wrote: > >> > Right ( but also was rebalancing cycle 2 day before pgs corrupted) > >> > > >> > 2015-08-21 2:23 GMT+03:00 Samuel Just : > >> >> > >> >> Specifically, the snap behavior (we already know that the pgs went > >> >> inconsistent while the pool was in writeback mode, right?). > >> >> -Sam > >> >> > >> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just > wrote: > >> >> > Yeah, I'm trying to confirm that the issues did happen in writeback > >> >> > mode. > >> >> > -Sam > >> >> > > >> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor > >> >> > wrote: > >> >> >> Right. But issues started... > >> >> >> > >> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : > >> >> >>> > >> >> >>> But that was still in writeback mode, right? > >> >> >>> -Sam > >> >> >>> > >> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor > >> >> >>> wrote: > >> >> >>> > WE haven't set values for max_bytes / max_objects.. and all > data > >> >> >>> > initially > >> >> >>> > writes only to cache layer and not flushed at all to cold > layer. > >> >> >>> > > >> >> >>> > Then we received notification from monitoring that we collect > >> >> >>> > about > >> >> >>> > 750GB in > >> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 > of > >> >> >>> > disk > >> >> >>> > size... And then evicting/flushing started... > >> >> >>> > > >> >> >>> > And issue with snapshots arrived > >> >> >>> > > >> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : > >> >> >>> >> > >> >> >>> >> Not sure what you mean by: > >> >> >>> >> > >> >> >>> >> but it's stop to work in same moment, when cache layer > fulfilled > >> >> >>> >> with > >> >> >>> >> data and evict/flush started... > >> >> >>> >> -Sam > >> >> >>> >> > >> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > >> >> >>> >> wrote: > >> >> >>> >> > No, when we start draining cache - bad pgs was in place... > >> >> >>> >> > We have big rebalance (disk by disk - to change journal side > >> >> >>> >> > on > >> >> >>> >> > both > >> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived > >> >> >>> >> > scrub > >> >> >>> >> > errors > >> >> >>> >> > and 2 > >> >> >>> >> > pgs inconsistent... > >> >> >>> >> > > >> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's > >> >> >>> >> > stop > >> >> >>> >> > to > >> >> >>> >> > work in > >> >> >>> >> > same moment, when cache layer fulfilled with data and > >> >> >>> >> > evict/flush > >> >> >>> >> > started... > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : > >> >> >>> >> >> > >> >> >>> >> >> So you started draining the cache pool before you saw > either > >> >> >>> >> >> the > >> >> >>> >> >> inconsistent pgs or the anomalous snap behavior? (That is, > >> >> >>> >> >> writeback > >> >> >>> >> >> mode was working correctly?) > >> >> >>> >> >> -Sam > >> >> >>> >> >> > >> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >> >> >>> >> >> wrote: > >> >> >>> >> >> > Good joke ) > >> >> >>> >> >> > > >> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just >: > >> >> >>> >> >> >> > >> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care > >> >> >>> >> >> >> about > >> >> >>> >> >> >> :). > >> >> >>> >> >> >> -Sam > >> >> >>> >> >> >> > >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just > >> >> >>> >> >> >> > >> >> >>> >> >> >> wrote: > >> >> >>> >> >> >> > What's supposed to happen is that the client > >> >> >>> >> >> >> > transparently > >> >> >>> >> >> >> > directs > >> >> >>> >> >> >> > all > >> >> >>> >> >> >> > requests to the cache pool rather than the cold pool > >> >> >>> >> >> >> > when > >> >> >>> >> >> >> > there > >> >> >>> >> >> >> > is > >> >> >>> >> >> >> > a > >> >> >>> >> >> >> > cache pool. If the kernel is sending requests to the > >> >> >>> >> >> >> > cold > >> >> >>> >> >> >> > pool, > >> >> >>> >> >> >> > that's probably where the bug is. Odd. It could also > >> >> >>> >> >> >> > be a > >> >> >>> >> >> >> > bug > >> >> >>> >> >> >> > specific 'forward' mode either in the client or on the > >> >> >>> >> >> >> > osd. > >> >> >>> >> >> >> > Why > >> >> >>> >> >> >> > did > >> >> >>> >> >> >> > you have it in that mode? > >> >> >>> >> >> >> > -Sam > >> >> >>> >> >> >> > > >> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >> >> >>> >> >> >> > wrote: > >> >> >>> >> >> >> >> We used 4.x branch, a
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Attachment blocked, so post as text... root@zzz:~# cat update_osd.sh #!/bin/bash ID=$1 echo "Process OSD# ${ID}" DEV=`mount | grep "ceph-${ID} " | cut -d " " -f 1` echo "OSD# ${ID} hosted on ${DEV::-1}" TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d " " -f 6` if [ "${TYPE_RAW}" == "Solid" ] then TYPE="ssd" elif [ "${TYPE_RAW}" == "7200" ] then TYPE="platter" fi echo "OSD Type = ${TYPE}" HOST=`hostname` echo "Current node hostname: ${HOST}" echo "Set noout option for CEPH cluster" ceph osd set noout echo "Marked OSD # ${ID} out" [19/1857] ceph osd out ${ID} echo "Remove OSD # ${ID} from CRUSHMAP" ceph osd crush remove osd.${ID} echo "Delete auth for OSD# ${ID}" ceph auth del osd.${ID} echo "Stop OSD# ${ID}" stop ceph-osd id=${ID} echo "Remove OSD # ${ID} from cluster" ceph osd rm ${ID} echo "Unmount OSD# ${ID}" umount ${DEV} echo "ZAP ${DEV::-1}" ceph-disk zap ${DEV::-1} echo "Create new OSD with ${DEV::-1}" ceph-disk-prepare ${DEV::-1} echo "Activate new OSD" ceph-disk-activate ${DEV} echo "Dump current CRUSHMAP" ceph osd getcrushmap -o cm.old echo "Decompile CRUSHMAP" crushtool -d cm.old -o cm echo "Place new OSD in proper place" sed -i "s/device${ID}/osd.${ID}/" cm LINE=`cat -n cm | sed -n "/${HOST}-${TYPE} {/,/}/p" | tail -n 1 | awk '{print $1}'` sed -i "${LINE}iitem osd.${ID} weight 1.000" cm echo "Modify ${HOST} weight into CRUSHMAP" sed -i "s/item ${HOST}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight 1.000/" cm echo "Compile new CRUSHMAP" crushtool -c cm -o cm.new echo "Inject new CRUSHMAP" ceph osd setcrushmap -i cm.new #echo "Clean..." #rm -rf cm cm.new echo "Unset noout option for CEPH cluster" ceph osd unset noout echo "OSD recreated... Waiting for rebalancing..." 2015-08-21 2:37 GMT+03:00 Voloshanenko Igor : > As i we use journal collocation for journal now (because we want to > utilize cache layer ((( ) i use ceph-disk to create new OSD (changed > journal size on ceph.conf). I don;t prefer manual work)) > > So create very simple script to update journal size > > 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor : > >> Exactly >> >> пятница, 21 августа 2015 г. пользователь Samuel Just написал: >> >> And you adjusted the journals by removing the osd, recreating it with >>> a larger journal, and reinserting it? >>> -Sam >>> >>> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor >>> wrote: >>> > Right ( but also was rebalancing cycle 2 day before pgs corrupted) >>> > >>> > 2015-08-21 2:23 GMT+03:00 Samuel Just : >>> >> >>> >> Specifically, the snap behavior (we already know that the pgs went >>> >> inconsistent while the pool was in writeback mode, right?). >>> >> -Sam >>> >> >>> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just >>> wrote: >>> >> > Yeah, I'm trying to confirm that the issues did happen in writeback >>> >> > mode. >>> >> > -Sam >>> >> > >>> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor >>> >> > wrote: >>> >> >> Right. But issues started... >>> >> >> >>> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : >>> >> >>> >>> >> >>> But that was still in writeback mode, right? >>> >> >>> -Sam >>> >> >>> >>> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >>> >> >>> wrote: >>> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data >>> >> >>> > initially >>> >> >>> > writes only to cache layer and not flushed at all to cold layer. >>> >> >>> > >>> >> >>> > Then we received notification from monitoring that we collect >>> about >>> >> >>> > 750GB in >>> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of >>> >> >>> > disk >>> >> >>> > size... And then evicting/flushing started... >>> >> >>> > >>> >> >>> > And issue with snapshots arrived >>> >> >>> > >>> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >>> >> >>> >> >>> >> >>> >> Not sure what you mean by: >>> >> >>> >> >>> >> >>> >> but it's stop to work in same moment, when cache layer >>> fulfilled >>> >> >>> >> with >>> >> >>> >> data and evict/flush started... >>> >> >>> >> -Sam >>> >> >>> >> >>> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >>> >> >>> >> wrote: >>> >> >>> >> > No, when we start draining cache - bad pgs was in place... >>> >> >>> >> > We have big rebalance (disk by disk - to change journal side >>> on >>> >> >>> >> > both >>> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived >>> scrub >>> >> >>> >> > errors >>> >> >>> >> > and 2 >>> >> >>> >> > pgs inconsistent... >>> >> >>> >> > >>> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's >>> stop >>> >> >>> >> > to >>> >> >>> >> > work in >>> >> >>> >> > same moment, when cache layer fulfilled with data and >>> evict/flush >>> >> >>> >> > started... >>> >> >>> >> > >>> >> >>> >> > >>> >> >>> >> > >>> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just : >>> >> >>> >> >> >>> >> >>> >> >> So you started draining the cache pool before you saw >>> either the >>> >> >>> >> >> inconsistent pgs
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
It would help greatly if, on a disposable cluster, you could reproduce the snapshot problem with debug osd = 20 debug filestore = 20 debug ms = 1 on all of the osds and attach the logs to the bug report. That should make it easier to work out what is going on. -Sam On Thu, Aug 20, 2015 at 4:40 PM, Voloshanenko Igor wrote: > Attachment blocked, so post as text... > > root@zzz:~# cat update_osd.sh > #!/bin/bash > > ID=$1 > echo "Process OSD# ${ID}" > > DEV=`mount | grep "ceph-${ID} " | cut -d " " -f 1` > echo "OSD# ${ID} hosted on ${DEV::-1}" > > TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d " " -f 6` > if [ "${TYPE_RAW}" == "Solid" ] > then > TYPE="ssd" > elif [ "${TYPE_RAW}" == "7200" ] > then > TYPE="platter" > fi > > echo "OSD Type = ${TYPE}" > > HOST=`hostname` > echo "Current node hostname: ${HOST}" > > echo "Set noout option for CEPH cluster" > ceph osd set noout > > echo "Marked OSD # ${ID} out" > [19/1857] > ceph osd out ${ID} > > echo "Remove OSD # ${ID} from CRUSHMAP" > ceph osd crush remove osd.${ID} > > echo "Delete auth for OSD# ${ID}" > ceph auth del osd.${ID} > > echo "Stop OSD# ${ID}" > stop ceph-osd id=${ID} > > echo "Remove OSD # ${ID} from cluster" > ceph osd rm ${ID} > > echo "Unmount OSD# ${ID}" > umount ${DEV} > > echo "ZAP ${DEV::-1}" > ceph-disk zap ${DEV::-1} > > echo "Create new OSD with ${DEV::-1}" > ceph-disk-prepare ${DEV::-1} > > echo "Activate new OSD" > ceph-disk-activate ${DEV} > > echo "Dump current CRUSHMAP" > ceph osd getcrushmap -o cm.old > > echo "Decompile CRUSHMAP" > crushtool -d cm.old -o cm > > echo "Place new OSD in proper place" > sed -i "s/device${ID}/osd.${ID}/" cm > LINE=`cat -n cm | sed -n "/${HOST}-${TYPE} {/,/}/p" | tail -n 1 | awk > '{print $1}'` > sed -i "${LINE}iitem osd.${ID} weight 1.000" cm > > echo "Modify ${HOST} weight into CRUSHMAP" > sed -i "s/item ${HOST}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight > 1.000/" cm > > echo "Compile new CRUSHMAP" > crushtool -c cm -o cm.new > > echo "Inject new CRUSHMAP" > ceph osd setcrushmap -i cm.new > > #echo "Clean..." > #rm -rf cm cm.new > > echo "Unset noout option for CEPH cluster" > ceph osd unset noout > > echo "OSD recreated... Waiting for rebalancing..." > > 2015-08-21 2:37 GMT+03:00 Voloshanenko Igor : >> >> As i we use journal collocation for journal now (because we want to >> utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal >> size on ceph.conf). I don;t prefer manual work)) >> >> So create very simple script to update journal size >> >> 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor : >>> >>> Exactly >>> >>> пятница, 21 августа 2015 г. пользователь Samuel Just написал: >>> And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor wrote: > Right ( but also was rebalancing cycle 2 day before pgs corrupted) > > 2015-08-21 2:23 GMT+03:00 Samuel Just : >> >> Specifically, the snap behavior (we already know that the pgs went >> inconsistent while the pool was in writeback mode, right?). >> -Sam >> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just >> wrote: >> > Yeah, I'm trying to confirm that the issues did happen in writeback >> > mode. >> > -Sam >> > >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor >> > wrote: >> >> Right. But issues started... >> >> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just : >> >>> >> >>> But that was still in writeback mode, right? >> >>> -Sam >> >>> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor >> >>> wrote: >> >>> > WE haven't set values for max_bytes / max_objects.. and all >> >>> > data >> >>> > initially >> >>> > writes only to cache layer and not flushed at all to cold >> >>> > layer. >> >>> > >> >>> > Then we received notification from monitoring that we collect >> >>> > about >> >>> > 750GB in >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 >> >>> > of >> >>> > disk >> >>> > size... And then evicting/flushing started... >> >>> > >> >>> > And issue with snapshots arrived >> >>> > >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just : >> >>> >> >> >>> >> Not sure what you mean by: >> >>> >> >> >>> >> but it's stop to work in same moment, when cache layer >> >>> >> fulfilled >> >>> >> with >> >>> >> data and evict/flush started... >> >>> >> -Sam >> >>> >> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor >> >>> >> wrote: >> >>> >> > No, when we start draining cache - bad pgs was in place... >> >>> >> > We have big rebalance (disk by disk - to change journal side >> >>> >> > on >> >>> >> > both >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - ar
Re: [ceph-users] Ceph OSD nodes in XenServer VMs
Hi Jiri, On Thu, 20 Aug 2015 11:55:55 +1000 Jiri Kanicky wrote: > We are experimenting with an idea to run OSD nodes in XenServer VMs. > We believe this could provide better flexibility, backups for the > nodes etc. Could you expand on this? As written, it seems like a bad idea to me, just because you'd be adding complexity for no gain. Can you explain, for instance, why you think it would enable better flexibility, or why it would help with backups? What is it that you intend to back up? Backing up the OS on a storage node should never be necessary, since it should be recreatable from config management, and backing up data on the OSDs is best done on a per-pool basis because the requirements are going to differ by pool and not by OSD. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PCIE-SSD OSD bottom performance issue
dear Loic: I'm sorry to bother you.But I have a question about ceph. I used PCIE-SSD to OSD disk . But I found it very bottom performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd by PCIE-SSD. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.35999 root default -2 0.17999 host tds_node03 0 0.17999 osd.0 up 1.0 1.0 -30.17999host tds_node04 1 0.17999 osd.1 up 1.0 1.0 I create pool and rbd device. I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS, I have tried many osd thread parameters, but not effect. But i tested 8K randrw(70%) in single PCIE-SSD, it has 10W IOPS. Is there any way to improve the PCIE-SSD OSD performance? scott_tan...@yahoo.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PCIE-SSD OSD bottom performance issue
Hello, On Thu, 20 Aug 2015 15:47:46 +0800 scott_tan...@yahoo.com wrote: The reason that you're not getting any replies is because we're not psychic/telepathic/clairvoyant. Meaning that you're not giving us enough information by far. > dear ALL: > I used PCIE-SSD to OSD disk . But I found it very bottom > performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd > by PCIE-SSD. > What PCIE-SDD? What hosts (HW, OS), network? What Ceph version, config changes? > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 0.35999 root default > -2 0.17999 host tds_node03 > 0 0.17999 osd.0 up 1.0 1.0 > -30.17999host tds_node04 > 1 0.17999 osd.1 up 1.0 1.0 > > I create pool and rbd device. What kind of pool, any non-default options? Where did you mount/access that RBD device from, userspace, kernel? What file system, if any? > I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS, Exact fio invocation parameters, output please. 1W IOPS is supposed to mean 1 write IOPS? Also for comparison purposes, the "standard" is to test with 4KB blocks for random access > I have tried many osd thread parameters, but not effect. Unless your HW, SSD has issues defaults should give a lot better results >But i tested 8K > randrw(70%) in single PCIE-SSD, it has 10W IOPS. > 10 write IOPS would still be abysmally slow. Single means running fio against the SSD directly? How does this compare to using the exact same setup but HDDs or normal SSDs? Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PCIE-SSD OSD bottom performance issue
my ceph.conf [global] auth_service_required = cephx osd_pool_default_size = 2 filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.168.2.171 mon_initial_members = tds_node01 fsid = fef619c4-5f4a-4bf1-a787-6c4d17995ec4 keyvaluestore op threads = 4 osd op threads = 4 filestore op threads = 4 osd disk threads = 2 osd max write size = 180 osd agent max ops = 8 rbd readahead trigger requests = 20 rbd readahead max bytes = 1048576 rbd readahead disable after bytes = 104857600 [mon.ceph_node01] host = ceph_node01 mon addr = 172.168.2.171:6789 [mon.ceph_node02] host = ceph_node02 mon addr = 192.168.2.172:6789 [mon.ceph_node03] host = ceph_node03 mon addr = 192.168.2.171:6789 [osd.0] host = ceph_node03 deves = /dev/nvme0n1p5 [osd.1] host = ceph_node04 deves = /dev/nvme0n1p5 ++ Even if I didn't adjust thread parameters, performance result is the same。 scott_tan...@yahoo.com From: Christian Balzer Date: 2015-08-21 09:40 To: ceph-users CC: scott_tan...@yahoo.com; liuxy666 Subject: Re: [ceph-users] PCIE-SSD OSD bottom performance issue Hello, On Thu, 20 Aug 2015 15:47:46 +0800 scott_tan...@yahoo.com wrote: The reason that you're not getting any replies is because we're not psychic/telepathic/clairvoyant. Meaning that you're not giving us enough information by far. > dear ALL: > I used PCIE-SSD to OSD disk . But I found it very bottom > performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd > by PCIE-SSD. > What PCIE-SDD? What hosts (HW, OS), network? What Ceph version, config changes? > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 0.35999 root default > -2 0.17999 host tds_node03 > 0 0.17999 osd.0 up 1.0 1.0 > -30.17999host tds_node04 > 1 0.17999 osd.1 up 1.0 1.0 > > I create pool and rbd device. What kind of pool, any non-default options? Where did you mount/access that RBD device from, userspace, kernel? What file system, if any? > I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS, Exact fio invocation parameters, output please. 1W IOPS is supposed to mean 1 write IOPS? Also for comparison purposes, the "standard" is to test with 4KB blocks for random access > I have tried many osd thread parameters, but not effect. Unless your HW, SSD has issues defaults should give a lot better results >But i tested 8K > randrw(70%) in single PCIE-SSD, it has 10W IOPS. > 10 write IOPS would still be abysmally slow. Single means running fio against the SSD directly? How does this compare to using the exact same setup but HDDs or normal SSDs? Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just wrote: > What's supposed to happen is that the client transparently directs all > requests to the cache pool rather than the cold pool when there is a > cache pool. If the kernel is sending requests to the cold pool, > that's probably where the bug is. Odd. It could also be a bug > specific 'forward' mode either in the client or on the osd. Why did > you have it in that mode? I think I reproduced this on today's master. Setup, cache mode is writeback: $ ./ceph osd pool create foo 12 12 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'foo' created $ ./ceph osd pool create foo-hot 12 12 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'foo-hot' created $ ./ceph osd tier add foo foo-hot *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'foo-hot' is now (or already was) a tier of 'foo' $ ./ceph osd tier cache-mode foo-hot writeback *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** set cache-mode for pool 'foo-hot' to writeback $ ./ceph osd tier set-overlay foo foo-hot *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** overlay for 'foo' is now (or already was) 'foo-hot' Create an image: $ ./rbd create --size 10M --image-format 2 foo/bar $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt $ sudo mkfs.ext4 /mnt/bar $ sudo umount /mnt Create a snapshot, take md5sum: $ ./rbd snap create foo/bar@snap $ ./rbd export foo/bar /tmp/foo-1 Exporting image: 100% complete...done. $ ./rbd export foo/bar@snap /tmp/snap-1 Exporting image: 100% complete...done. $ md5sum /tmp/foo-1 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1 $ md5sum /tmp/snap-1 83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1 Set the cache mode to forward and do a flush, hashes don't match - the snap is empty - we bang on the hot tier and don't get redirected to the cold tier, I suspect: $ ./ceph osd tier cache-mode foo-hot forward *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** set cache-mode for pool 'foo-hot' to forward $ ./rados -p foo-hot cache-flush-evict-all rbd_data.100a6b8b4567.0002 rbd_id.bar rbd_directory rbd_header.100a6b8b4567 bar.rbd rbd_data.100a6b8b4567.0001 rbd_data.100a6b8b4567. $ ./rados -p foo-hot cache-flush-evict-all $ ./rbd export foo/bar /tmp/foo-2 Exporting image: 100% complete...done. $ ./rbd export foo/bar@snap /tmp/snap-2 Exporting image: 100% complete...done. $ md5sum /tmp/foo-2 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2 $ md5sum /tmp/snap-2 f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2 $ od /tmp/snap-2 000 00 00 00 00 00 00 00 00 * 5000 Disable the cache tier and we are back to normal: $ ./ceph osd tier remove-overlay foo *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** there is now (or already was) no overlay for 'foo' $ ./rbd export foo/bar /tmp/foo-3 Exporting image: 100% complete...done. $ ./rbd export foo/bar@snap /tmp/snap-3 Exporting image: 100% complete...done. $ md5sum /tmp/foo-3 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3 $ md5sum /tmp/snap-3 83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3 I first reproduced it with the kernel client, rbd export was just to take it out of the equation. Also, Igor sort of raised a question in his second message: if, after setting the cache mode to forward and doing a flush, I open an image (not a snapshot, so may not be related to the above) for write (e.g. with rbd-fuse), I get an rbd header object in the hot pool, even though it's in forward mode: $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt $ sudo mount /mnt/bar /media $ sudo umount /media $ sudo umount /mnt $ ./rados -p foo-hot ls rbd_header.100a6b8b4567 $ ./rados -p foo ls | grep rbd_header rbd_header.100a6b8b4567 It's been a while since I looked into tiering, is that how it's supposed to work? It looks like it happens because rbd_header op replies don't redirect? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rados: Undefined symbol error
Hello, I cloned the master branch of Ceph and after setting up the cluster, when I tried to use the rados commands, I got this error: rados: symbol lookup error: rados: undefined symbol: _ZN5MutexC1ERKSsbbbP11CephContext I saw a similar post here: http://tracker.ceph.com/issues/12563 but I am not clear on the solution for this problem. I am not performing an upgrade here but the error seems to be similar. Could anybody shed more light on the issue and how to solve it? Thanks a lot! Aakanksha ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] НА: Question
Hi , I've do that before and when I try to write file into rbd. It's get freeze. Beside resource, is there any other reason not recommend to combined mon and osd? Best wishes, Mika 2015-08-18 15:52 GMT+08:00 Межов Игорь Александрович : > Hi! > > You can run mons on the same hosts, though it is not recommemned. MON > daemon > itself are not resurce hungry - 1-2 cores and 2-4 Gb RAM is enough in most > small > installs. But there are some pitfalls: > - MONs use LevelDB as a backstorage, and widely use direct write to ensure > DB consistency. > So, if MON daemon coexits with OSDs not only on the same host, but on the > same > volume/disk/controller - it will severily reduce disk io available to OSD, > thus greatly > reduce overall performance. Moving MONs root to separate spindle, or > better - separate SSD > will keep MONs running fine with OSDs at the same host. > - When cluster is in healthy state, MONs are not resource consuming, but > when cluster > in "changing state" (adding/removing OSDs, backfiling, etc) the CPU and > memory usage > for MON can raise significantly. > > And yes, in small cluster, it is not alaways possible to get 3 separate > hosts for MONs only. > > > Megov Igor > CIO, Yuterra > > -- > *От:* ceph-users от имени Luis > Periquito > *Отправлено:* 17 августа 2015 г. 17:09 > *Кому:* Kris Vaes > *Копия:* ceph-users@lists.ceph.com > *Тема:* Re: [ceph-users] Question > > yes. The issue is resource sharing as usual: the MONs will use disk I/O, > memory and CPU. If the cluster is small (test?) then there's no problem in > using the same disks. If the cluster starts to get bigger you may want to > dedicate resources (e.g. the disk for the MONs isn't used by an OSD). If > the cluster is big enough you may want to dedicate a node for being a MON. > > On Mon, Aug 17, 2015 at 2:56 PM, Kris Vaes wrote: > >> Hi, >> >> Maybe this seems like a strange question but i could not find this info >> in the docs , i have following question, >> >> For the ceph cluster you need osd daemons and monitor daemons, >> >> On a host you can run several osd daemons (best one per drive as read in >> the docs) on one host >> >> But now my question can you run on the same host where you run already >> some osd daemons the monitor daemon >> >> Is this possible and what are the implications of doing this >> >> >> >> Met Vriendelijke Groeten >> Cordialement >> Kind Regards >> Cordialmente >> С приятелски поздрави >> >> >> This message (including any attachments) may be privileged or >> confidential. If you have received it by mistake, please notify the sender >> by return e-mail and delete this message from your system. Any unauthorized >> use or dissemination of this message in whole or in part is strictly >> prohibited. S3S rejects any liability for the improper, incomplete or >> delayed transmission of the information contained in this message, as well >> as for damages resulting from this e-mail message. S3S cannot guarantee >> that the message received by you has not been intercepted by third parties >> and/or manipulated by computer programs used to transmit messages and >> viruses. >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Exact as in our case. Ilya, same for images from our side. Headers opened from hot tier пятница, 21 августа 2015 г. пользователь Ilya Dryomov написал: > On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just > wrote: > > What's supposed to happen is that the client transparently directs all > > requests to the cache pool rather than the cold pool when there is a > > cache pool. If the kernel is sending requests to the cold pool, > > that's probably where the bug is. Odd. It could also be a bug > > specific 'forward' mode either in the client or on the osd. Why did > > you have it in that mode? > > I think I reproduced this on today's master. > > Setup, cache mode is writeback: > > $ ./ceph osd pool create foo 12 12 > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > pool 'foo' created > $ ./ceph osd pool create foo-hot 12 12 > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > pool 'foo-hot' created > $ ./ceph osd tier add foo foo-hot > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > pool 'foo-hot' is now (or already was) a tier of 'foo' > $ ./ceph osd tier cache-mode foo-hot writeback > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > set cache-mode for pool 'foo-hot' to writeback > $ ./ceph osd tier set-overlay foo foo-hot > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > overlay for 'foo' is now (or already was) 'foo-hot' > > Create an image: > > $ ./rbd create --size 10M --image-format 2 foo/bar > $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt > $ sudo mkfs.ext4 /mnt/bar > $ sudo umount /mnt > > Create a snapshot, take md5sum: > > $ ./rbd snap create foo/bar@snap > $ ./rbd export foo/bar /tmp/foo-1 > Exporting image: 100% complete...done. > $ ./rbd export foo/bar@snap /tmp/snap-1 > Exporting image: 100% complete...done. > $ md5sum /tmp/foo-1 > 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1 > $ md5sum /tmp/snap-1 > 83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1 > > Set the cache mode to forward and do a flush, hashes don't match - the > snap is empty - we bang on the hot tier and don't get redirected to the > cold tier, I suspect: > > $ ./ceph osd tier cache-mode foo-hot forward > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > set cache-mode for pool 'foo-hot' to forward > $ ./rados -p foo-hot cache-flush-evict-all > rbd_data.100a6b8b4567.0002 > rbd_id.bar > rbd_directory > rbd_header.100a6b8b4567 > bar.rbd > rbd_data.100a6b8b4567.0001 > rbd_data.100a6b8b4567. > $ ./rados -p foo-hot cache-flush-evict-all > $ ./rbd export foo/bar /tmp/foo-2 > Exporting image: 100% complete...done. > $ ./rbd export foo/bar@snap /tmp/snap-2 > Exporting image: 100% complete...done. > $ md5sum /tmp/foo-2 > 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2 > $ md5sum /tmp/snap-2 > f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2 > $ od /tmp/snap-2 > 000 00 00 00 00 00 00 00 00 > * > 5000 > > Disable the cache tier and we are back to normal: > > $ ./ceph osd tier remove-overlay foo > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** > there is now (or already was) no overlay for 'foo' > $ ./rbd export foo/bar /tmp/foo-3 > Exporting image: 100% complete...done. > $ ./rbd export foo/bar@snap /tmp/snap-3 > Exporting image: 100% complete...done. > $ md5sum /tmp/foo-3 > 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3 > $ md5sum /tmp/snap-3 > 83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3 > > I first reproduced it with the kernel client, rbd export was just to > take it out of the equation. > > > Also, Igor sort of raised a question in his second message: if, after > setting the cache mode to forward and doing a flush, I open an image > (not a snapshot, so may not be related to the above) for write (e.g. > with rbd-fuse), I get an rbd header object in the hot pool, even though > it's in forward mode: > > $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt > $ sudo mount /mnt/bar /media > $ sudo umount /media > $ sudo umount /mnt > $ ./rados -p foo-hot ls > rbd_header.100a6b8b4567 > $ ./rados -p foo ls | grep rbd_header > rbd_header.100a6b8b4567 > > It's been a while since I looked into tiering, is that how it's > supposed to work? It looks like it happens because rbd_header op > replies don't redirect? > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com