[ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread scott_tan...@yahoo.com
dear ALL:
I used PCIE-SSD to OSD disk . But I found it very bottom performance. 
I have two hosts, each host 1 PCIE-SSD,so i create two osd by PCIE-SSD.

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1   0.35999 root default 
-2   0.17999 host tds_node03 
0 0.17999  osd.0 up 1.0 1.0 
-30.17999host tds_node04 
1 0.17999  osd.1 up 1.0 1.0 

I create pool and rbd device.
I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS, I have 
tried many osd thread parameters, but not effect.
But i tested 8K randrw(70%) in single PCIE-SSD, it has 10W IOPS.

Is there any way to improve the PCIE-SSD  OSD performance?





scott_tan...@yahoo.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Christian Balzer

Hello,

from all the pertinent points by Somnath, the one about pre-conditioning
would be pretty high on my list, especially if this slowness persists and
nothing else (scrub) is going on.

This might be "fixed" by doing a fstrim.

Additionally the levelDB's per OSD are of course sync'ing heavily during
reconstruction, so that might not be the favorite thing for your type of
SSDs.

But ultimately situational awareness is very important, as in "what" is
actually going and slowing things down. 
As usual my recommendations would be to use atop, iostat or similar on all
your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
maybe just one of them or something else entirely.

Christian

On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:

> Also, check if scrubbing started in the cluster or not. That may
> considerably slow down the cluster.
> 
> -Original Message-
> From: Somnath Roy 
> Sent: Wednesday, August 19, 2015 1:35 PM
> To: 'J-P Methot'; ceph-us...@ceph.com
> Subject: RE: [ceph-users] Bad performances in recovery
> 
> All the writes will go through the journal.
> It may happen your SSDs are not preconditioned well and after a lot of
> writes during recovery IOs are stabilized to lower number. This is quite
> common for SSDs if that is the case.
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: J-P Methot [mailto:jpmet...@gtcomm.net]
> Sent: Wednesday, August 19, 2015 1:03 PM
> To: Somnath Roy; ceph-us...@ceph.com
> Subject: Re: [ceph-users] Bad performances in recovery
> 
> Hi,
> 
> Thank you for the quick reply. However, we do have those exact settings
> for recovery and it still strongly affects client io. I have looked at
> various ceph logs and osd logs and nothing is out of the ordinary.
> Here's an idea though, please tell me if I am wrong.
> 
> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
> explained several times on this mailing list, Samsung SSDs suck in ceph.
> They have horrible O_dsync speed and die easily, when used as journal.
> That's why we're using Intel ssds for journaling, so that we didn't end
> up putting 96 samsung SSDs in the trash.
> 
> In recovery though, what is the ceph behaviour? What kind of write does
> it do on the OSD SSDs? Does it write directly to the SSDs or through the
> journal?
> 
> Additionally, something else we notice: the ceph cluster is MUCH slower
> after recovery than before. Clearly there is a bottleneck somewhere and
> that bottleneck does not get cleared up after the recovery is done.
> 
> 
> On 2015-08-19 3:32 PM, Somnath Roy wrote:
> > If you are concerned about *client io performance* during recovery,
> > use these settings..
> > 
> > osd recovery max active = 1
> > osd max backfills = 1
> > osd recovery threads = 1
> > osd recovery op priority = 1
> > 
> > If you are concerned about *recovery performance*, you may want to
> > bump this up, but I doubt it will help much from default settings..
> > 
> > Thanks & Regards
> > Somnath
> > 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> > Of J-P Methot
> > Sent: Wednesday, August 19, 2015 12:17 PM
> > To: ceph-us...@ceph.com
> > Subject: [ceph-users] Bad performances in recovery
> > 
> > Hi,
> > 
> > Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
> > a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
> > The ceph version is hammer v0.94.1 . There is a performance overhead
> > because we're using SSDs (I've heard it gets better in infernalis, but
> > we're not upgrading just yet) but we can reach numbers that I would
> > consider "alright".
> > 
> > Now, the issue is, when the cluster goes into recovery it's very fast
> > at first, but then slows down to ridiculous levels as it moves
> > forward. You can go from 7% to 2% to recover in ten minutes, but it
> > may take 2 hours to recover the last 2%. While this happens, the
> > attached openstack setup becomes incredibly slow, even though there is
> > only a small fraction of objects still recovering (less than 1%). The
> > settings that may affect recovery speed are very low, as they are by
> > default, yet they still affect client io speed way more than it should.
> > 
> > Why would ceph recovery become so slow as it progress and affect
> > client io even though it's recovering at a snail's pace? And by a
> > snail's pace, I mean a few kb/second on 10gbps uplinks. --
> > == Jean-Philippe Méthot
> > Administrateur système / System administrator GloboTech Communications
> > Phone: 1-514-907-0050
> > Toll Free: 1-(888)-GTCOMM1
> > Fax: 1-(514)-907-0750
> > jpmet...@gtcomm.net
> > http://www.gtcomm.net
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > 
> > PLEASE NOTE: The information contained in t

Re: [ceph-users] Ceph OSD nodes in XenServer VMs

2015-08-20 Thread Christian Balzer

Hello,

On Thu, 20 Aug 2015 11:55:55 +1000 Jiri Kanicky wrote:

> Hi all,
> 
> We are experimenting with an idea to run OSD nodes in XenServer VMs. We 
> believe this could provide better flexibility, backups for the nodes etc.
> 
> For example:
> Xenserver with 4 HDDs dedicated for Ceph.
> We would introduce 1 VM (OSD node) with raw/direct access to 4 HDDs or 2 
> VMs (2 OSD nodes) with 2 HDDs each.
> 
> Do you have any experience with this? Any thoughts on this? Good or bad 
> idea?
> 
My knee jerk reaction would be definitely in the "bad idea" category.
Even with "raw" access I'd venture that isn't as fast as actual bare-metal
and your network will be virtualized anyway.

What really puzzles/amuses me though is using the one VM platform that
doesn't support Ceph as the basis for providing Ceph. ^o^

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd debug question / proposal

2015-08-20 Thread Jan Schermer
Just to clarify - you unmounted the filesystem with "umount -l"? That almost 
never a good idea, and it puts the OSD in a very unusual situation where IO 
will actually work on the open files, but it can't open any new ones. I think 
this would be enough to confuse just about any piece of software.
Was journal on the filesystem or on a separate partition/device?

It's not the same as R/O filesystem (I hit that once and no such havoc 
happened), in my experience the OSD traps and exits when something like that 
happens.

It would be interesting to know what would happen if you just did rm -rf 
/var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount -l, 
more or less :-)

Jan



> On 20 Aug 2015, at 08:01, Goncalo Borges  wrote:
> 
> Dear Ceph gurus...
> 
> Just wanted to report something that may be interesting to enhance... or 
> maybe I am not doing the right debugging procedure. 
> 
> 1. I am working with 0.92.2 and I am testing the cluster in several disaster 
> catastrophe scenarios.
> 
> 2. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per 
> server.
> 
> 3. I have deliberately unmounted  the filesystem of osd.4 but the daemon was 
> left on. I just wanted to understand how the system would react. This was 
> what happened:
> a. While there was no I/0, the system did not realized that the osd-4 
> filesystem was not mounted, and the 'ceph -s' continues to report HEALTH_OK 
> for the system status. 
> 
> b. When I've started to impose some heavy I/O, the system started to complain 
> of slow requests. Curiously, osd.4 never appears in the logs.
> # ceph -s
> cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>  health HEALTH_WARN
> 170 requests are blocked > 32 sec
>  monmap e1: 3 mons at 
> {rccephmon1=192.231.127.8:6789/0,rccephmon2=192.231.127.34:6789/0,rccephmon3=192.231.127.26:6789/0}
> election epoch 24, quorum 0,1,2 rccephmon1,rccephmon3,rccephmon2
>  mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay
>  osdmap e1179: 32 osds: 32 up, 32 in
>   pgmap v907325: 2176 pgs, 2 pools, 4928 GB data, 1843 kobjects
> 14823 GB used, 74228 GB / 89051 GB avail
> 2174 active+clean
>2 active+clean+replay
> 
> # ceph -w
> (...)  
> 2015-08-19 17:44:55.161731 osd.1 [WRN] 88 slow requests, 8 included below; 
> oldest blocked for > 3156.325716 secs
> 2015-08-19 17:44:55.161940 osd.1 [WRN] slow request 1920.533342 seconds old, 
> received at 2015-08-19 17:12:54.628258: osd_op(client.44544.1:2266980 
> 100022a.6aec [write 524288~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161950 osd.1 [WRN] slow request 1920.511098 seconds old, 
> received at 2015-08-19 17:12:54.650502: osd_op(client.44544.1:2266988 
> 100022a.6aec [write 1048576~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161957 osd.1 [WRN] slow request 1920.510451 seconds old, 
> received at 2015-08-19 17:12:54.651149: osd_op(client.44544.1:2266996 
> 100022a.6aec [write 1572864~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161963 osd.1 [WRN] slow request 1920.488589 seconds old, 
> received at 2015-08-19 17:12:54.673011: osd_op(client.44544.1:2267004 
> 100022a.6aec [write 2097152~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161970 osd.1 [WRN] slow request 1920.482785 seconds old, 
> received at 2015-08-19 17:12:54.678815: osd_op(client.44544.1:2267012 
> 100022a.6aec [write 2621440~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> (...) 
> # grep "slow requests" /tmp/osd_failed.txt  | awk '{print $3}' | sort | uniq
> osd.1
> osd.11
> osd.17
> osd.23
> osd.24
> osd.26
> osd.27
> osd.31
> osd.7
> 
> c. None of the standard 'ceph osd' commands indicated that the problematic 
> OSD was osd.4. Only looking to ceph-osd.4.log, we find write error messages:
> 015-08-19 16:52:17.552512 7f6f69973700  0 -- 10.100.1.167:6809/23763 >> 
> 10.100.1.169:6800/28352 pipe(0x175ca000 sd=169 :6809 s=0 pgs=0 cs=0 l=0 
> c=0x1f038000).accept connect_seq 180 vs existing 179 state standby
> 2015-08-19 16:52:17.566701 7f6f89d2a700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/100022a.0030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.567230 7f6f89d2a700  0 
> filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 
> 5.180_head/e6f81180/100022a.0030/head//5: (2) No such file or 
> directory
> 2015-08-19 16:52:17.567332 7f6f89d2a700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/100022a.0030/head//5 in index: (2) No such file or dire

[ceph-users] Testing CephFS

2015-08-20 Thread Simon Hallam
Hey all,

We are currently testing CephFS on a small (3 node) cluster.

The setup is currently:

Each server has 12 OSDs, 1 Monitor and 1 MDS running on it:
The servers are running: 0.94.2-0.el7
The clients are running: Ceph: 0.80.10-1.fc21, Kernel: 4.0.6-200.fc21.x86_64

ceph -s
cluster 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd
 health HEALTH_OK
 monmap e1: 3 mons at 
{ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0}
election epoch 20, quorum 0,1,2 ceph1,ceph2,ceph3
 mdsmap e12: 1/1/1 up {0=ceph3=up:active}, 2 up:standby
 osdmap e389: 36 osds: 36 up, 36 in
  pgmap v19370: 8256 pgs, 3 pools, 51217 MB data, 14035 objects
95526 MB used, 196 TB / 196 TB avail
8256 active+clean

Our Ceph.conf is relatively simple at the moment:

cat /etc/ceph/ceph.conf
[global]
fsid = 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd
mon_initial_members = ceph1, ceph2, ceph3
mon_host = 10.15.0.1,10.15.0.2,10.15.0.3
mon_pg_warn_max_per_osd = 1000
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2

When I pulled the plug on the master MDS last time (ceph1), it stopped all IO 
until I plugged it back in. I was under the assumption that the MDS would fail 
over the other 2 MDS's and IO would continue?

Is there something I need to do to allow the MDS's to failover from each other 
without too much interruption? Or is this because the clients ceph version?

Cheers,

Simon Hallam
Linux Support & Development Officer



Please visit our new website at www.pml.ac.uk and follow us on Twitter  
@PlymouthMarine

Winner of the Environment & Conservation category, the Charity Awards 2014.

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered 
in England & Wales, company number 4178503. Registered Charity No. 1091222. 
Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in 
error, please notify the sender and remove it from your system. You are 
reminded that e-mail communications are not secure and may contain viruses; PML 
accepts no liability for any loss or damage which may be caused by viruses.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Hi Samuel, we try to fix it in trick way.

we check all rbd_data chunks from logs (OSD) which are affected, then query
rbd info to compare which rbd consist bad rbd_data, after that we mount
this rbd as rbd0, create empty rbd, and DD all info from bad volume to new
one.

But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos
try to out OSD which was lead, but after rebalancing this 2 pgs still have
35 scrub errors...

ceph osd getmap -o  - attached


2015-08-18 18:48 GMT+03:00 Samuel Just :

> Is the number of inconsistent objects growing?  Can you attach the
> whole ceph.log from the 6 hours before and after the snippet you
> linked above?  Are you using cache/tiering?  Can you attach the osdmap
> (ceph osd getmap -o )?
> -Sam
>
> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>  wrote:
> > ceph - 0.94.2
> > Its happen during rebalancing
> >
> > I thought too, that some OSD miss copy, but looks like all miss...
> > So any advice in which direction i need to go
> >
> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
> >>
> >> From a quick peek it looks like some of the OSDs are missing clones of
> >> objects. I'm not sure how that could happen and I'd expect the pg
> >> repair to handle that but if it's not there's probably something
> >> wrong; what version of Ceph are you running? Sam, is this something
> >> you've seen, a new bug, or some kind of config issue?
> >> -Greg
> >>
> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
> >>  wrote:
> >> > Hi all, at our production cluster, due high rebalancing ((( we have 2
> >> > pgs in
> >> > inconsistent state...
> >> >
> >> > root@temp:~# ceph health detail | grep inc
> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> >
> >> > From OSD logs, after recovery attempt:
> >> >
> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read
> i; do
> >> > ceph pg repair ${i} ; done
> >> > dumped all in format plain
> >> > instructing pg 2.490 on osd.56 to repair
> >> > instructing pg 2.c4 on osd.56 to repair
> >> >
> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected
> clone
> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
> clone
> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
> clone
> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
> clone
> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
> clone
> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2
> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
> clone
> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2
> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected
> clone
> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> > e1509490/rbd_data.1423897545e146.09a6/head//2 expected
> clone
> >> > 28809490/rbd_data.edea7460fe42b.01d9/141//2
> >> > /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765
> 7f94663b3700
> >> > -1
> >> > log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors
> >> >
> >> > So, how i can solve "expected clone" situation by hand?
> >> > Thank in advance!
> >> >
> >> >
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
>


osdmap

Re: [ceph-users] Ceph File System ACL Support

2015-08-20 Thread Yan, Zheng
The code is at https://github.com/ceph/samba.git wip-acl. So far the
code does not handle default ACL (files created by samba do not
inherit parent directory's default ACL)

Regards
Yan, Zheng


On Tue, Aug 18, 2015 at 6:57 PM, Gregory Farnum  wrote:
> On Mon, Aug 17, 2015 at 4:12 AM, Yan, Zheng  wrote:
>> On Mon, Aug 17, 2015 at 9:38 AM, Eric Eastman
>>  wrote:
>>> Hi,
>>>
>>> I need to verify in Ceph v9.0.2 if the kernel version of Ceph file
>>> system supports ACLs and the libcephfs file system interface does not.
>>> I am trying to have SAMBA, version 4.3.0rc1, support Windows ACLs
>>> using "vfs objects = acl_xattr" with the SAMBA VFS Ceph file system
>>> interface "vfs objects = ceph" and my tests are failing. If I use a
>>> kernel mount of the same Ceph file system, it works.  Using the SAMBA
>>> Ceph VFS interface with logging set to 3 in my smb.conf files shows
>>> the following error when on my Windows AD server I try to "Disable
>>> inheritance" of the SAMBA exported directory uu/home:
>>>
>>> [2015/08/16 18:27:11.546307,  2]
>>> ../source3/smbd/posix_acls.c:3006(set_canon_ace_list)
>>>   set_canon_ace_list: sys_acl_set_file type file failed for file
>>> uu/home (Operation not supported).
>>>
>>> This works using the same Ceph file system kernel mounted. It also
>>> works with an XFS file system.
>>>
>>> Doing some Googling I found this entry on the SAMBA email list:
>>>
>>> https://lists.samba.org/archive/samba-technical/2015-March/106699.html
>>>
>>> It states: libcephfs does not support ACL yet, so this patch adds ACL
>>> callbacks that do nothing.
>>>
>>> If ACL support is not in libcephfs, is there plans to add it, as the
>>> SAMBA Ceph VFS interface without ACL support is severely limited in a
>>> multi-user Windows environment.
>>>
>>
>> libcephfs does not support ACL. I have an old patch that adds ACL
>> support to samba's vfs ceph module, but haven't tested it carefully.
>
> Are these published somewhere? Even if you don't have time to work on
> it somebody else might pick it up and finish things if it's available
> as a starting point. :)
> -Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread J-P Methot
Hi,

Just to update the mailing list, we ended up going back to default
ceph.conf without any additional settings than what is mandatory. We are
now reaching speeds we never reached before, both in recovery and in
regular usage. There was definitely something we set in the ceph.conf
bogging everything down.


On 2015-08-20 4:06 AM, Christian Balzer wrote:
> 
> Hello,
> 
> from all the pertinent points by Somnath, the one about pre-conditioning
> would be pretty high on my list, especially if this slowness persists and
> nothing else (scrub) is going on.
> 
> This might be "fixed" by doing a fstrim.
> 
> Additionally the levelDB's per OSD are of course sync'ing heavily during
> reconstruction, so that might not be the favorite thing for your type of
> SSDs.
> 
> But ultimately situational awareness is very important, as in "what" is
> actually going and slowing things down. 
> As usual my recommendations would be to use atop, iostat or similar on all
> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
> maybe just one of them or something else entirely.
> 
> Christian
> 
> On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
> 
>> Also, check if scrubbing started in the cluster or not. That may
>> considerably slow down the cluster.
>>
>> -Original Message-
>> From: Somnath Roy 
>> Sent: Wednesday, August 19, 2015 1:35 PM
>> To: 'J-P Methot'; ceph-us...@ceph.com
>> Subject: RE: [ceph-users] Bad performances in recovery
>>
>> All the writes will go through the journal.
>> It may happen your SSDs are not preconditioned well and after a lot of
>> writes during recovery IOs are stabilized to lower number. This is quite
>> common for SSDs if that is the case.
>>
>> Thanks & Regards
>> Somnath
>>
>> -Original Message-
>> From: J-P Methot [mailto:jpmet...@gtcomm.net]
>> Sent: Wednesday, August 19, 2015 1:03 PM
>> To: Somnath Roy; ceph-us...@ceph.com
>> Subject: Re: [ceph-users] Bad performances in recovery
>>
>> Hi,
>>
>> Thank you for the quick reply. However, we do have those exact settings
>> for recovery and it still strongly affects client io. I have looked at
>> various ceph logs and osd logs and nothing is out of the ordinary.
>> Here's an idea though, please tell me if I am wrong.
>>
>> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
>> explained several times on this mailing list, Samsung SSDs suck in ceph.
>> They have horrible O_dsync speed and die easily, when used as journal.
>> That's why we're using Intel ssds for journaling, so that we didn't end
>> up putting 96 samsung SSDs in the trash.
>>
>> In recovery though, what is the ceph behaviour? What kind of write does
>> it do on the OSD SSDs? Does it write directly to the SSDs or through the
>> journal?
>>
>> Additionally, something else we notice: the ceph cluster is MUCH slower
>> after recovery than before. Clearly there is a bottleneck somewhere and
>> that bottleneck does not get cleared up after the recovery is done.
>>
>>
>> On 2015-08-19 3:32 PM, Somnath Roy wrote:
>>> If you are concerned about *client io performance* during recovery,
>>> use these settings..
>>>
>>> osd recovery max active = 1
>>> osd max backfills = 1
>>> osd recovery threads = 1
>>> osd recovery op priority = 1
>>>
>>> If you are concerned about *recovery performance*, you may want to
>>> bump this up, but I doubt it will help much from default settings..
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
>>> Of J-P Methot
>>> Sent: Wednesday, August 19, 2015 12:17 PM
>>> To: ceph-us...@ceph.com
>>> Subject: [ceph-users] Bad performances in recovery
>>>
>>> Hi,
>>>
>>> Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
>>> a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
>>> The ceph version is hammer v0.94.1 . There is a performance overhead
>>> because we're using SSDs (I've heard it gets better in infernalis, but
>>> we're not upgrading just yet) but we can reach numbers that I would
>>> consider "alright".
>>>
>>> Now, the issue is, when the cluster goes into recovery it's very fast
>>> at first, but then slows down to ridiculous levels as it moves
>>> forward. You can go from 7% to 2% to recover in ten minutes, but it
>>> may take 2 hours to recover the last 2%. While this happens, the
>>> attached openstack setup becomes incredibly slow, even though there is
>>> only a small fraction of objects still recovering (less than 1%). The
>>> settings that may affect recovery speed are very low, as they are by
>>> default, yet they still affect client io speed way more than it should.
>>>
>>> Why would ceph recovery become so slow as it progress and affect
>>> client io even though it's recovering at a snail's pace? And by a
>>> snail's pace, I mean a few kb/second on 10gbps uplinks. --
>>> == Jean-Philippe Méthot
>>> Administrateur système / System admi

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Alex Gorbachev
>
> Just to update the mailing list, we ended up going back to default
> ceph.conf without any additional settings than what is mandatory. We are
> now reaching speeds we never reached before, both in recovery and in
> regular usage. There was definitely something we set in the ceph.conf
> bogging everything down.

Could you please share the old and new ceph.conf, or the section that
was removed?

Best regards,
Alex

>
>
> On 2015-08-20 4:06 AM, Christian Balzer wrote:
>>
>> Hello,
>>
>> from all the pertinent points by Somnath, the one about pre-conditioning
>> would be pretty high on my list, especially if this slowness persists and
>> nothing else (scrub) is going on.
>>
>> This might be "fixed" by doing a fstrim.
>>
>> Additionally the levelDB's per OSD are of course sync'ing heavily during
>> reconstruction, so that might not be the favorite thing for your type of
>> SSDs.
>>
>> But ultimately situational awareness is very important, as in "what" is
>> actually going and slowing things down.
>> As usual my recommendations would be to use atop, iostat or similar on all
>> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
>> maybe just one of them or something else entirely.
>>
>> Christian
>>
>> On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
>>
>>> Also, check if scrubbing started in the cluster or not. That may
>>> considerably slow down the cluster.
>>>
>>> -Original Message-
>>> From: Somnath Roy
>>> Sent: Wednesday, August 19, 2015 1:35 PM
>>> To: 'J-P Methot'; ceph-us...@ceph.com
>>> Subject: RE: [ceph-users] Bad performances in recovery
>>>
>>> All the writes will go through the journal.
>>> It may happen your SSDs are not preconditioned well and after a lot of
>>> writes during recovery IOs are stabilized to lower number. This is quite
>>> common for SSDs if that is the case.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -Original Message-
>>> From: J-P Methot [mailto:jpmet...@gtcomm.net]
>>> Sent: Wednesday, August 19, 2015 1:03 PM
>>> To: Somnath Roy; ceph-us...@ceph.com
>>> Subject: Re: [ceph-users] Bad performances in recovery
>>>
>>> Hi,
>>>
>>> Thank you for the quick reply. However, we do have those exact settings
>>> for recovery and it still strongly affects client io. I have looked at
>>> various ceph logs and osd logs and nothing is out of the ordinary.
>>> Here's an idea though, please tell me if I am wrong.
>>>
>>> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
>>> explained several times on this mailing list, Samsung SSDs suck in ceph.
>>> They have horrible O_dsync speed and die easily, when used as journal.
>>> That's why we're using Intel ssds for journaling, so that we didn't end
>>> up putting 96 samsung SSDs in the trash.
>>>
>>> In recovery though, what is the ceph behaviour? What kind of write does
>>> it do on the OSD SSDs? Does it write directly to the SSDs or through the
>>> journal?
>>>
>>> Additionally, something else we notice: the ceph cluster is MUCH slower
>>> after recovery than before. Clearly there is a bottleneck somewhere and
>>> that bottleneck does not get cleared up after the recovery is done.
>>>
>>>
>>> On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery,
 use these settings..

 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1

 If you are concerned about *recovery performance*, you may want to
 bump this up, but I doubt it will help much from default settings..

 Thanks & Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery

 Hi,

 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
 a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
 The ceph version is hammer v0.94.1 . There is a performance overhead
 because we're using SSDs (I've heard it gets better in infernalis, but
 we're not upgrading just yet) but we can reach numbers that I would
 consider "alright".

 Now, the issue is, when the cluster goes into recovery it's very fast
 at first, but then slows down to ridiculous levels as it moves
 forward. You can go from 7% to 2% to recover in ten minutes, but it
 may take 2 hours to recover the last 2%. While this happens, the
 attached openstack setup becomes incredibly slow, even though there is
 only a small fraction of objects still recovering (less than 1%). The
 settings that may affect recovery speed are very low, as they are by
 default, yet they still affect client io speed way more than it should.

 Why would ceph recovery become so slow as it progress and affect
 cl

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Jan Schermer
Are you sure it was because of configuration changes?
Maybe it was restarting the OSDs that fixed it?
We often hit an issue with backfill_toofull where the recovery/backfill 
processes get stuck until we restart the daemons (sometimes setting 
recovery_max_active helps as well). It still shows recovery of few objects now 
and then (few KB/s) and then stops completely.

Jan

> On 20 Aug 2015, at 17:43, Alex Gorbachev  wrote:
> 
>> 
>> Just to update the mailing list, we ended up going back to default
>> ceph.conf without any additional settings than what is mandatory. We are
>> now reaching speeds we never reached before, both in recovery and in
>> regular usage. There was definitely something we set in the ceph.conf
>> bogging everything down.
> 
> Could you please share the old and new ceph.conf, or the section that
> was removed?
> 
> Best regards,
> Alex
> 
>> 
>> 
>> On 2015-08-20 4:06 AM, Christian Balzer wrote:
>>> 
>>> Hello,
>>> 
>>> from all the pertinent points by Somnath, the one about pre-conditioning
>>> would be pretty high on my list, especially if this slowness persists and
>>> nothing else (scrub) is going on.
>>> 
>>> This might be "fixed" by doing a fstrim.
>>> 
>>> Additionally the levelDB's per OSD are of course sync'ing heavily during
>>> reconstruction, so that might not be the favorite thing for your type of
>>> SSDs.
>>> 
>>> But ultimately situational awareness is very important, as in "what" is
>>> actually going and slowing things down.
>>> As usual my recommendations would be to use atop, iostat or similar on all
>>> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
>>> maybe just one of them or something else entirely.
>>> 
>>> Christian
>>> 
>>> On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
>>> 
 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.
 
 -Original Message-
 From: Somnath Roy
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery
 
 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs if that is the case.
 
 Thanks & Regards
 Somnath
 
 -Original Message-
 From: J-P Methot [mailto:jpmet...@gtcomm.net]
 Sent: Wednesday, August 19, 2015 1:03 PM
 To: Somnath Roy; ceph-us...@ceph.com
 Subject: Re: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Thank you for the quick reply. However, we do have those exact settings
 for recovery and it still strongly affects client io. I have looked at
 various ceph logs and osd logs and nothing is out of the ordinary.
 Here's an idea though, please tell me if I am wrong.
 
 We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
 explained several times on this mailing list, Samsung SSDs suck in ceph.
 They have horrible O_dsync speed and die easily, when used as journal.
 That's why we're using Intel ssds for journaling, so that we didn't end
 up putting 96 samsung SSDs in the trash.
 
 In recovery though, what is the ceph behaviour? What kind of write does
 it do on the OSD SSDs? Does it write directly to the SSDs or through the
 journal?
 
 Additionally, something else we notice: the ceph cluster is MUCH slower
 after recovery than before. Clearly there is a bottleneck somewhere and
 that bottleneck does not get cleared up after the recovery is done.
 
 
 On 2015-08-19 3:32 PM, Somnath Roy wrote:
> If you are concerned about *client io performance* during recovery,
> use these settings..
> 
> osd recovery max active = 1
> osd max backfills = 1
> osd recovery threads = 1
> osd recovery op priority = 1
> 
> If you are concerned about *recovery performance*, you may want to
> bump this up, but I doubt it will help much from default settings..
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of J-P Methot
> Sent: Wednesday, August 19, 2015 12:17 PM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Bad performances in recovery
> 
> Hi,
> 
> Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
> a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
> The ceph version is hammer v0.94.1 . There is a performance overhead
> because we're using SSDs (I've heard it gets better in infernalis, but
> we're not upgrading just yet) but we can reach numbers that I would
> consider "alright".
> 
> Now, the issue is, when the cluster goes into recovery it's very fast
> at fir

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Ok, you appear to be using a replicated cache tier in front of a
replicated base tier.  Please scrub both inconsistent pgs and post the
ceph.log from before when you started the scrub until after.  Also,
what command are you using to take snapshots?
-Sam

On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
 wrote:
> Hi Samuel, we try to fix it in trick way.
>
> we check all rbd_data chunks from logs (OSD) which are affected, then query
> rbd info to compare which rbd consist bad rbd_data, after that we mount this
> rbd as rbd0, create empty rbd, and DD all info from bad volume to new one.
>
> But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos
> try to out OSD which was lead, but after rebalancing this 2 pgs still have
> 35 scrub errors...
>
> ceph osd getmap -o  - attached
>
>
> 2015-08-18 18:48 GMT+03:00 Samuel Just :
>>
>> Is the number of inconsistent objects growing?  Can you attach the
>> whole ceph.log from the 6 hours before and after the snippet you
>> linked above?  Are you using cache/tiering?  Can you attach the osdmap
>> (ceph osd getmap -o )?
>> -Sam
>>
>> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>>  wrote:
>> > ceph - 0.94.2
>> > Its happen during rebalancing
>> >
>> > I thought too, that some OSD miss copy, but looks like all miss...
>> > So any advice in which direction i need to go
>> >
>> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
>> >>
>> >> From a quick peek it looks like some of the OSDs are missing clones of
>> >> objects. I'm not sure how that could happen and I'd expect the pg
>> >> repair to handle that but if it's not there's probably something
>> >> wrong; what version of Ceph are you running? Sam, is this something
>> >> you've seen, a new bug, or some kind of config issue?
>> >> -Greg
>> >>
>> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Hi all, at our production cluster, due high rebalancing ((( we have 2
>> >> > pgs in
>> >> > inconsistent state...
>> >> >
>> >> > root@temp:~# ceph health detail | grep inc
>> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >> >
>> >> > From OSD logs, after recovery attempt:
>> >> >
>> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i;
>> >> > do
>> >> > ceph pg repair ${i} ; done
>> >> > dumped all in format plain
>> >> > instructing pg 2.490 on osd.56 to repair
>> >> > instructing pg 2.c4 on osd.56 to repair
>> >> >
>> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected
>> >> > clone
>> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
>> >> > clone
>> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
>> >> > clone
>> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
>> >> > clone
>> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
>> >> > clone
>> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
>> >> > clone
>> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected
>> >> > clone
>> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
>> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432
>> >> > 7f94663b3700
>> >> > -1
>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> > e1509490/rbd_data.1423897545e146.09a6/head//2 expected
>> >> > clone
>> >> > 28809490/rbd_d

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Also, was there at any point a power failure/power cycle event,
perhaps on osd 56?
-Sam

On Thu, Aug 20, 2015 at 9:23 AM, Samuel Just  wrote:
> Ok, you appear to be using a replicated cache tier in front of a
> replicated base tier.  Please scrub both inconsistent pgs and post the
> ceph.log from before when you started the scrub until after.  Also,
> what command are you using to take snapshots?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>  wrote:
>> Hi Samuel, we try to fix it in trick way.
>>
>> we check all rbd_data chunks from logs (OSD) which are affected, then query
>> rbd info to compare which rbd consist bad rbd_data, after that we mount this
>> rbd as rbd0, create empty rbd, and DD all info from bad volume to new one.
>>
>> But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos
>> try to out OSD which was lead, but after rebalancing this 2 pgs still have
>> 35 scrub errors...
>>
>> ceph osd getmap -o  - attached
>>
>>
>> 2015-08-18 18:48 GMT+03:00 Samuel Just :
>>>
>>> Is the number of inconsistent objects growing?  Can you attach the
>>> whole ceph.log from the 6 hours before and after the snippet you
>>> linked above?  Are you using cache/tiering?  Can you attach the osdmap
>>> (ceph osd getmap -o )?
>>> -Sam
>>>
>>> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>>>  wrote:
>>> > ceph - 0.94.2
>>> > Its happen during rebalancing
>>> >
>>> > I thought too, that some OSD miss copy, but looks like all miss...
>>> > So any advice in which direction i need to go
>>> >
>>> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
>>> >>
>>> >> From a quick peek it looks like some of the OSDs are missing clones of
>>> >> objects. I'm not sure how that could happen and I'd expect the pg
>>> >> repair to handle that but if it's not there's probably something
>>> >> wrong; what version of Ceph are you running? Sam, is this something
>>> >> you've seen, a new bug, or some kind of config issue?
>>> >> -Greg
>>> >>
>>> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>>> >>  wrote:
>>> >> > Hi all, at our production cluster, due high rebalancing ((( we have 2
>>> >> > pgs in
>>> >> > inconsistent state...
>>> >> >
>>> >> > root@temp:~# ceph health detail | grep inc
>>> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>>> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>>> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>>> >> >
>>> >> > From OSD logs, after recovery attempt:
>>> >> >
>>> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i;
>>> >> > do
>>> >> > ceph pg repair ${i} ; done
>>> >> > dumped all in format plain
>>> >> > instructing pg 2.490 on osd.56 to repair
>>> >> > instructing pg 2.c4 on osd.56 to repair
>>> >> >
>>> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected
>>> >> > clone
>>> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
>>> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
>>> >> > clone
>>> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
>>> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
>>> >> > clone
>>> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
>>> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
>>> >> > clone
>>> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
>>> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
>>> >> > clone
>>> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2
>>> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
>>> >> > clone
>>> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2
>>> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
>>> >> > 7f94663b3700
>>> >> > -1
>>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>>> >> > 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected
>>> >> > clone
>>> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
>>> >

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Samuel, we turned off cache layer few hours ago...
I will post ceph.log in few minutes

For snap - we found issue, was connected with cache tier..

2015-08-20 19:23 GMT+03:00 Samuel Just :

> Ok, you appear to be using a replicated cache tier in front of a
> replicated base tier.  Please scrub both inconsistent pgs and post the
> ceph.log from before when you started the scrub until after.  Also,
> what command are you using to take snapshots?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>  wrote:
> > Hi Samuel, we try to fix it in trick way.
> >
> > we check all rbd_data chunks from logs (OSD) which are affected, then
> query
> > rbd info to compare which rbd consist bad rbd_data, after that we mount
> this
> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new
> one.
> >
> > But after that - scrub errors growing... Was 15 errors.. .Now 35... We
> laos
> > try to out OSD which was lead, but after rebalancing this 2 pgs still
> have
> > 35 scrub errors...
> >
> > ceph osd getmap -o  - attached
> >
> >
> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
> >>
> >> Is the number of inconsistent objects growing?  Can you attach the
> >> whole ceph.log from the 6 hours before and after the snippet you
> >> linked above?  Are you using cache/tiering?  Can you attach the osdmap
> >> (ceph osd getmap -o )?
> >> -Sam
> >>
> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
> >>  wrote:
> >> > ceph - 0.94.2
> >> > Its happen during rebalancing
> >> >
> >> > I thought too, that some OSD miss copy, but looks like all miss...
> >> > So any advice in which direction i need to go
> >> >
> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
> >> >>
> >> >> From a quick peek it looks like some of the OSDs are missing clones
> of
> >> >> objects. I'm not sure how that could happen and I'd expect the pg
> >> >> repair to handle that but if it's not there's probably something
> >> >> wrong; what version of Ceph are you running? Sam, is this something
> >> >> you've seen, a new bug, or some kind of config issue?
> >> >> -Greg
> >> >>
> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
> >> >>  wrote:
> >> >> > Hi all, at our production cluster, due high rebalancing ((( we
> have 2
> >> >> > pgs in
> >> >> > inconsistent state...
> >> >> >
> >> >> > root@temp:~# ceph health detail | grep inc
> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> >> >
> >> >> > From OSD logs, after recovery attempt:
> >> >> >
> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while
> read i;
> >> >> > do
> >> >> > ceph pg repair ${i} ; done
> >> >> > dumped all in format plain
> >> >> > instructing pg 2.490 on osd.56 to repair
> >> >> > instructing pg 2.c4 on osd.56 to repair
> >> >> >
> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected
> >> >> > clone
> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
> >> >> > clone
> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
> >> >> > clone
> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
> >> >> > clone
> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
> >> >> > clone
> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2
> >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
> >> >> > clone
> >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/141//2
> >> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
> >> >> > 7f94663b3700
> >> >> > -1
> >> >> > log_channel(cluster) log [ERR] : deep-scrub

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
What was the issue?
-Sam

On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
 wrote:
> Samuel, we turned off cache layer few hours ago...
> I will post ceph.log in few minutes
>
> For snap - we found issue, was connected with cache tier..
>
> 2015-08-20 19:23 GMT+03:00 Samuel Just :
>>
>> Ok, you appear to be using a replicated cache tier in front of a
>> replicated base tier.  Please scrub both inconsistent pgs and post the
>> ceph.log from before when you started the scrub until after.  Also,
>> what command are you using to take snapshots?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>>  wrote:
>> > Hi Samuel, we try to fix it in trick way.
>> >
>> > we check all rbd_data chunks from logs (OSD) which are affected, then
>> > query
>> > rbd info to compare which rbd consist bad rbd_data, after that we mount
>> > this
>> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new
>> > one.
>> >
>> > But after that - scrub errors growing... Was 15 errors.. .Now 35... We
>> > laos
>> > try to out OSD which was lead, but after rebalancing this 2 pgs still
>> > have
>> > 35 scrub errors...
>> >
>> > ceph osd getmap -o  - attached
>> >
>> >
>> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
>> >>
>> >> Is the number of inconsistent objects growing?  Can you attach the
>> >> whole ceph.log from the 6 hours before and after the snippet you
>> >> linked above?  Are you using cache/tiering?  Can you attach the osdmap
>> >> (ceph osd getmap -o )?
>> >> -Sam
>> >>
>> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > ceph - 0.94.2
>> >> > Its happen during rebalancing
>> >> >
>> >> > I thought too, that some OSD miss copy, but looks like all miss...
>> >> > So any advice in which direction i need to go
>> >> >
>> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
>> >> >>
>> >> >> From a quick peek it looks like some of the OSDs are missing clones
>> >> >> of
>> >> >> objects. I'm not sure how that could happen and I'd expect the pg
>> >> >> repair to handle that but if it's not there's probably something
>> >> >> wrong; what version of Ceph are you running? Sam, is this something
>> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> -Greg
>> >> >>
>> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Hi all, at our production cluster, due high rebalancing ((( we
>> >> >> > have 2
>> >> >> > pgs in
>> >> >> > inconsistent state...
>> >> >> >
>> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >> >> >
>> >> >> > From OSD logs, after recovery attempt:
>> >> >> >
>> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read
>> >> >> > i;
>> >> >> > do
>> >> >> > ceph pg repair ${i} ; done
>> >> >> > dumped all in format plain
>> >> >> > instructing pg 2.490 on osd.56 to repair
>> >> >> > instructing pg 2.c4 on osd.56 to repair
>> >> >> >
>> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
>> >> >> > 7f94663b3700
>> >> >> > -1
>> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2 expected
>> >> >> > clone
>> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
>> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
>> >> >> > 7f94663b3700
>> >> >> > -1
>> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
>> >> >> > clone
>> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
>> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
>> >> >> > 7f94663b3700
>> >> >> > -1
>> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
>> >> >> > clone
>> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
>> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
>> >> >> > 7f94663b3700
>> >> >> > -1
>> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
>> >> >> > clone
>> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
>> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
>> >> >> > 7f94663b3700
>> >> >> > -1
>> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> > 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
>> >> >> > clone
>> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/141//2
>> >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
>> >> >> > 7f94663b3700
>> >> >> > -1
>> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> > c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Issue, that in forward mode, fstrim doesn't work proper, and when we take
snapshot - data not proper update in cache layer, and client (ceph) see
damaged snap.. As headers requested from cache layer.

2015-08-20 19:53 GMT+03:00 Samuel Just :

> What was the issue?
> -Sam
>
> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>  wrote:
> > Samuel, we turned off cache layer few hours ago...
> > I will post ceph.log in few minutes
> >
> > For snap - we found issue, was connected with cache tier..
> >
> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
> >>
> >> Ok, you appear to be using a replicated cache tier in front of a
> >> replicated base tier.  Please scrub both inconsistent pgs and post the
> >> ceph.log from before when you started the scrub until after.  Also,
> >> what command are you using to take snapshots?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
> >>  wrote:
> >> > Hi Samuel, we try to fix it in trick way.
> >> >
> >> > we check all rbd_data chunks from logs (OSD) which are affected, then
> >> > query
> >> > rbd info to compare which rbd consist bad rbd_data, after that we
> mount
> >> > this
> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new
> >> > one.
> >> >
> >> > But after that - scrub errors growing... Was 15 errors.. .Now 35... We
> >> > laos
> >> > try to out OSD which was lead, but after rebalancing this 2 pgs still
> >> > have
> >> > 35 scrub errors...
> >> >
> >> > ceph osd getmap -o  - attached
> >> >
> >> >
> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
> >> >>
> >> >> Is the number of inconsistent objects growing?  Can you attach the
> >> >> whole ceph.log from the 6 hours before and after the snippet you
> >> >> linked above?  Are you using cache/tiering?  Can you attach the
> osdmap
> >> >> (ceph osd getmap -o )?
> >> >> -Sam
> >> >>
> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
> >> >>  wrote:
> >> >> > ceph - 0.94.2
> >> >> > Its happen during rebalancing
> >> >> >
> >> >> > I thought too, that some OSD miss copy, but looks like all miss...
> >> >> > So any advice in which direction i need to go
> >> >> >
> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
> >> >> >>
> >> >> >> From a quick peek it looks like some of the OSDs are missing
> clones
> >> >> >> of
> >> >> >> objects. I'm not sure how that could happen and I'd expect the pg
> >> >> >> repair to handle that but if it's not there's probably something
> >> >> >> wrong; what version of Ceph are you running? Sam, is this
> something
> >> >> >> you've seen, a new bug, or some kind of config issue?
> >> >> >> -Greg
> >> >> >>
> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
> >> >> >>  wrote:
> >> >> >> > Hi all, at our production cluster, due high rebalancing ((( we
> >> >> >> > have 2
> >> >> >> > pgs in
> >> >> >> > inconsistent state...
> >> >> >> >
> >> >> >> > root@temp:~# ceph health detail | grep inc
> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> >> >> >
> >> >> >> > From OSD logs, after recovery attempt:
> >> >> >> >
> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while
> read
> >> >> >> > i;
> >> >> >> > do
> >> >> >> > ceph pg repair ${i} ; done
> >> >> >> > dumped all in format plain
> >> >> >> > instructing pg 2.490 on osd.56 to repair
> >> >> >> > instructing pg 2.c4 on osd.56 to repair
> >> >> >> >
> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
> >> >> >> > 7f94663b3700
> >> >> >> > -1
> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2
> expected
> >> >> >> > clone
> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
> >> >> >> > 7f94663b3700
> >> >> >> > -1
> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2
> expected
> >> >> >> > clone
> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
> >> >> >> > 7f94663b3700
> >> >> >> > -1
> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2
> expected
> >> >> >> > clone
> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
> >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
> >> >> >> > 7f94663b3700
> >> >> >> > -1
> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> >> > bac19490/rbd_data.1238e82ae8944a.032e/head//2
> expected
> >> >> >> > clone
> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
> >> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
> >> >> >> > 7f9466

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Is there a bug for this in the tracker?
-Sam

On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
 wrote:
> Issue, that in forward mode, fstrim doesn't work proper, and when we take
> snapshot - data not proper update in cache layer, and client (ceph) see
> damaged snap.. As headers requested from cache layer.
>
> 2015-08-20 19:53 GMT+03:00 Samuel Just :
>>
>> What was the issue?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>>  wrote:
>> > Samuel, we turned off cache layer few hours ago...
>> > I will post ceph.log in few minutes
>> >
>> > For snap - we found issue, was connected with cache tier..
>> >
>> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
>> >>
>> >> Ok, you appear to be using a replicated cache tier in front of a
>> >> replicated base tier.  Please scrub both inconsistent pgs and post the
>> >> ceph.log from before when you started the scrub until after.  Also,
>> >> what command are you using to take snapshots?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Hi Samuel, we try to fix it in trick way.
>> >> >
>> >> > we check all rbd_data chunks from logs (OSD) which are affected, then
>> >> > query
>> >> > rbd info to compare which rbd consist bad rbd_data, after that we
>> >> > mount
>> >> > this
>> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new
>> >> > one.
>> >> >
>> >> > But after that - scrub errors growing... Was 15 errors.. .Now 35...
>> >> > We
>> >> > laos
>> >> > try to out OSD which was lead, but after rebalancing this 2 pgs still
>> >> > have
>> >> > 35 scrub errors...
>> >> >
>> >> > ceph osd getmap -o  - attached
>> >> >
>> >> >
>> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> Is the number of inconsistent objects growing?  Can you attach the
>> >> >> whole ceph.log from the 6 hours before and after the snippet you
>> >> >> linked above?  Are you using cache/tiering?  Can you attach the
>> >> >> osdmap
>> >> >> (ceph osd getmap -o )?
>> >> >> -Sam
>> >> >>
>> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > ceph - 0.94.2
>> >> >> > Its happen during rebalancing
>> >> >> >
>> >> >> > I thought too, that some OSD miss copy, but looks like all miss...
>> >> >> > So any advice in which direction i need to go
>> >> >> >
>> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
>> >> >> >>
>> >> >> >> From a quick peek it looks like some of the OSDs are missing
>> >> >> >> clones
>> >> >> >> of
>> >> >> >> objects. I'm not sure how that could happen and I'd expect the pg
>> >> >> >> repair to handle that but if it's not there's probably something
>> >> >> >> wrong; what version of Ceph are you running? Sam, is this
>> >> >> >> something
>> >> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> >> -Greg
>> >> >> >>
>> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >> >>  wrote:
>> >> >> >> > Hi all, at our production cluster, due high rebalancing ((( we
>> >> >> >> > have 2
>> >> >> >> > pgs in
>> >> >> >> > inconsistent state...
>> >> >> >> >
>> >> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >> >> >> >
>> >> >> >> > From OSD logs, after recovery attempt:
>> >> >> >> >
>> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while
>> >> >> >> > read
>> >> >> >> > i;
>> >> >> >> > do
>> >> >> >> > ceph pg repair ${i} ; done
>> >> >> >> > dumped all in format plain
>> >> >> >> > instructing pg 2.490 on osd.56 to repair
>> >> >> >> > instructing pg 2.c4 on osd.56 to repair
>> >> >> >> >
>> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
>> >> >> >> > 7f94663b3700
>> >> >> >> > -1
>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2
>> >> >> >> > expected
>> >> >> >> > clone
>> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
>> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
>> >> >> >> > 7f94663b3700
>> >> >> >> > -1
>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2
>> >> >> >> > expected
>> >> >> >> > clone
>> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
>> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
>> >> >> >> > 7f94663b3700
>> >> >> >> > -1
>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.37b3/head//2
>> >> >> >> > expected
>> >> >> >> > clone
>> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/141//2
>> >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
>> >> >> >> > 7f94663b3700
>> >> >> >> > -1
>> >> >> >>

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Not yet. I will create.
But according to mail lists and Inktank docs - it's expected behaviour when
cache enable

2015-08-20 19:56 GMT+03:00 Samuel Just :

> Is there a bug for this in the tracker?
> -Sam
>
> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>  wrote:
> > Issue, that in forward mode, fstrim doesn't work proper, and when we take
> > snapshot - data not proper update in cache layer, and client (ceph) see
> > damaged snap.. As headers requested from cache layer.
> >
> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
> >>
> >> What was the issue?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
> >>  wrote:
> >> > Samuel, we turned off cache layer few hours ago...
> >> > I will post ceph.log in few minutes
> >> >
> >> > For snap - we found issue, was connected with cache tier..
> >> >
> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
> >> >>
> >> >> Ok, you appear to be using a replicated cache tier in front of a
> >> >> replicated base tier.  Please scrub both inconsistent pgs and post
> the
> >> >> ceph.log from before when you started the scrub until after.  Also,
> >> >> what command are you using to take snapshots?
> >> >> -Sam
> >> >>
> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
> >> >>  wrote:
> >> >> > Hi Samuel, we try to fix it in trick way.
> >> >> >
> >> >> > we check all rbd_data chunks from logs (OSD) which are affected,
> then
> >> >> > query
> >> >> > rbd info to compare which rbd consist bad rbd_data, after that we
> >> >> > mount
> >> >> > this
> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to
> new
> >> >> > one.
> >> >> >
> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now 35...
> >> >> > We
> >> >> > laos
> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
> still
> >> >> > have
> >> >> > 35 scrub errors...
> >> >> >
> >> >> > ceph osd getmap -o  - attached
> >> >> >
> >> >> >
> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
> >> >> >>
> >> >> >> Is the number of inconsistent objects growing?  Can you attach the
> >> >> >> whole ceph.log from the 6 hours before and after the snippet you
> >> >> >> linked above?  Are you using cache/tiering?  Can you attach the
> >> >> >> osdmap
> >> >> >> (ceph osd getmap -o )?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
> >> >> >>  wrote:
> >> >> >> > ceph - 0.94.2
> >> >> >> > Its happen during rebalancing
> >> >> >> >
> >> >> >> > I thought too, that some OSD miss copy, but looks like all
> miss...
> >> >> >> > So any advice in which direction i need to go
> >> >> >> >
> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
> >> >> >> >>
> >> >> >> >> From a quick peek it looks like some of the OSDs are missing
> >> >> >> >> clones
> >> >> >> >> of
> >> >> >> >> objects. I'm not sure how that could happen and I'd expect the
> pg
> >> >> >> >> repair to handle that but if it's not there's probably
> something
> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this
> >> >> >> >> something
> >> >> >> >> you've seen, a new bug, or some kind of config issue?
> >> >> >> >> -Greg
> >> >> >> >>
> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
> >> >> >> >>  wrote:
> >> >> >> >> > Hi all, at our production cluster, due high rebalancing (((
> we
> >> >> >> >> > have 2
> >> >> >> >> > pgs in
> >> >> >> >> > inconsistent state...
> >> >> >> >> >
> >> >> >> >> > root@temp:~# ceph health detail | grep inc
> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> >> >> >> >
> >> >> >> >> > From OSD logs, after recovery attempt:
> >> >> >> >> >
> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
> while
> >> >> >> >> > read
> >> >> >> >> > i;
> >> >> >> >> > do
> >> >> >> >> > ceph pg repair ${i} ; done
> >> >> >> >> > dumped all in format plain
> >> >> >> >> > instructing pg 2.490 on osd.56 to repair
> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair
> >> >> >> >> >
> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
> >> >> >> >> > 7f94663b3700
> >> >> >> >> > -1
> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2
> >> >> >> >> > expected
> >> >> >> >> > clone
> >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
> >> >> >> >> > 7f94663b3700
> >> >> >> >> > -1
> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
> >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.522f/head//2
> >> >> >> >> > expected
> >> >> >> >> > clone
> >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/141//2
> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
> >> >> >>

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Which docs?
-Sam

On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
 wrote:
> Not yet. I will create.
> But according to mail lists and Inktank docs - it's expected behaviour when
> cache enable
>
> 2015-08-20 19:56 GMT+03:00 Samuel Just :
>>
>> Is there a bug for this in the tracker?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>>  wrote:
>> > Issue, that in forward mode, fstrim doesn't work proper, and when we
>> > take
>> > snapshot - data not proper update in cache layer, and client (ceph) see
>> > damaged snap.. As headers requested from cache layer.
>> >
>> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
>> >>
>> >> What was the issue?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Samuel, we turned off cache layer few hours ago...
>> >> > I will post ceph.log in few minutes
>> >> >
>> >> > For snap - we found issue, was connected with cache tier..
>> >> >
>> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> Ok, you appear to be using a replicated cache tier in front of a
>> >> >> replicated base tier.  Please scrub both inconsistent pgs and post
>> >> >> the
>> >> >> ceph.log from before when you started the scrub until after.  Also,
>> >> >> what command are you using to take snapshots?
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Hi Samuel, we try to fix it in trick way.
>> >> >> >
>> >> >> > we check all rbd_data chunks from logs (OSD) which are affected,
>> >> >> > then
>> >> >> > query
>> >> >> > rbd info to compare which rbd consist bad rbd_data, after that we
>> >> >> > mount
>> >> >> > this
>> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to
>> >> >> > new
>> >> >> > one.
>> >> >> >
>> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
>> >> >> > 35...
>> >> >> > We
>> >> >> > laos
>> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
>> >> >> > still
>> >> >> > have
>> >> >> > 35 scrub errors...
>> >> >> >
>> >> >> > ceph osd getmap -o  - attached
>> >> >> >
>> >> >> >
>> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
>> >> >> >>
>> >> >> >> Is the number of inconsistent objects growing?  Can you attach
>> >> >> >> the
>> >> >> >> whole ceph.log from the 6 hours before and after the snippet you
>> >> >> >> linked above?  Are you using cache/tiering?  Can you attach the
>> >> >> >> osdmap
>> >> >> >> (ceph osd getmap -o )?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >> >> >>  wrote:
>> >> >> >> > ceph - 0.94.2
>> >> >> >> > Its happen during rebalancing
>> >> >> >> >
>> >> >> >> > I thought too, that some OSD miss copy, but looks like all
>> >> >> >> > miss...
>> >> >> >> > So any advice in which direction i need to go
>> >> >> >> >
>> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum :
>> >> >> >> >>
>> >> >> >> >> From a quick peek it looks like some of the OSDs are missing
>> >> >> >> >> clones
>> >> >> >> >> of
>> >> >> >> >> objects. I'm not sure how that could happen and I'd expect the
>> >> >> >> >> pg
>> >> >> >> >> repair to handle that but if it's not there's probably
>> >> >> >> >> something
>> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this
>> >> >> >> >> something
>> >> >> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> >> >> -Greg
>> >> >> >> >>
>> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >> >> >>  wrote:
>> >> >> >> >> > Hi all, at our production cluster, due high rebalancing (((
>> >> >> >> >> > we
>> >> >> >> >> > have 2
>> >> >> >> >> > pgs in
>> >> >> >> >> > inconsistent state...
>> >> >> >> >> >
>> >> >> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >> >> >> >> >
>> >> >> >> >> > From OSD logs, after recovery attempt:
>> >> >> >> >> >
>> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
>> >> >> >> >> > while
>> >> >> >> >> > read
>> >> >> >> >> > i;
>> >> >> >> >> > do
>> >> >> >> >> > ceph pg repair ${i} ; done
>> >> >> >> >> > dumped all in format plain
>> >> >> >> >> > instructing pg 2.490 on osd.56 to repair
>> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair
>> >> >> >> >> >
>> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
>> >> >> >> >> > 7f94663b3700
>> >> >> >> >> > -1
>> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> > f5759490/rbd_data.1631755377d7e.04da/head//2
>> >> >> >> >> > expected
>> >> >> >> >> > clone
>> >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.7a65/141//2
>> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
>> >> >> >> >> > 7f94663b3700
>> >> >> >> >> > -1
>> >> >> >> >

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Inktank:
https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

Mail-list:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html

2015-08-20 20:06 GMT+03:00 Samuel Just :

> Which docs?
> -Sam
>
> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>  wrote:
> > Not yet. I will create.
> > But according to mail lists and Inktank docs - it's expected behaviour
> when
> > cache enable
> >
> > 2015-08-20 19:56 GMT+03:00 Samuel Just :
> >>
> >> Is there a bug for this in the tracker?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
> >>  wrote:
> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
> >> > take
> >> > snapshot - data not proper update in cache layer, and client (ceph)
> see
> >> > damaged snap.. As headers requested from cache layer.
> >> >
> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
> >> >>
> >> >> What was the issue?
> >> >> -Sam
> >> >>
> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
> >> >>  wrote:
> >> >> > Samuel, we turned off cache layer few hours ago...
> >> >> > I will post ceph.log in few minutes
> >> >> >
> >> >> > For snap - we found issue, was connected with cache tier..
> >> >> >
> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
> >> >> >>
> >> >> >> Ok, you appear to be using a replicated cache tier in front of a
> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and post
> >> >> >> the
> >> >> >> ceph.log from before when you started the scrub until after.
> Also,
> >> >> >> what command are you using to take snapshots?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
> >> >> >>  wrote:
> >> >> >> > Hi Samuel, we try to fix it in trick way.
> >> >> >> >
> >> >> >> > we check all rbd_data chunks from logs (OSD) which are affected,
> >> >> >> > then
> >> >> >> > query
> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that
> we
> >> >> >> > mount
> >> >> >> > this
> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume
> to
> >> >> >> > new
> >> >> >> > one.
> >> >> >> >
> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
> >> >> >> > 35...
> >> >> >> > We
> >> >> >> > laos
> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
> >> >> >> > still
> >> >> >> > have
> >> >> >> > 35 scrub errors...
> >> >> >> >
> >> >> >> > ceph osd getmap -o  - attached
> >> >> >> >
> >> >> >> >
> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
> >> >> >> >>
> >> >> >> >> Is the number of inconsistent objects growing?  Can you attach
> >> >> >> >> the
> >> >> >> >> whole ceph.log from the 6 hours before and after the snippet
> you
> >> >> >> >> linked above?  Are you using cache/tiering?  Can you attach the
> >> >> >> >> osdmap
> >> >> >> >> (ceph osd getmap -o )?
> >> >> >> >> -Sam
> >> >> >> >>
> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
> >> >> >> >>  wrote:
> >> >> >> >> > ceph - 0.94.2
> >> >> >> >> > Its happen during rebalancing
> >> >> >> >> >
> >> >> >> >> > I thought too, that some OSD miss copy, but looks like all
> >> >> >> >> > miss...
> >> >> >> >> > So any advice in which direction i need to go
> >> >> >> >> >
> >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum <
> gfar...@redhat.com>:
> >> >> >> >> >>
> >> >> >> >> >> From a quick peek it looks like some of the OSDs are missing
> >> >> >> >> >> clones
> >> >> >> >> >> of
> >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect
> the
> >> >> >> >> >> pg
> >> >> >> >> >> repair to handle that but if it's not there's probably
> >> >> >> >> >> something
> >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this
> >> >> >> >> >> something
> >> >> >> >> >> you've seen, a new bug, or some kind of config issue?
> >> >> >> >> >> -Greg
> >> >> >> >> >>
> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
> >> >> >> >> >>  wrote:
> >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing
> (((
> >> >> >> >> >> > we
> >> >> >> >> >> > have 2
> >> >> >> >> >> > pgs in
> >> >> >> >> >> > inconsistent state...
> >> >> >> >> >> >
> >> >> >> >> >> > root@temp:~# ceph health detail | grep inc
> >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> >> >> >> >> >
> >> >> >> >> >> > From OSD logs, after recovery attempt:
> >> >> >> >> >> >
> >> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
> >> >> >> >> >> > while
> >> >> >> >> >> > read
> >> >> >> >> >> > i;
> >> >> >> >> >> > do
> >> >> >> >> >> > ceph pg repair ${i} ; done
> >> >> >> >> >> > dumped all in format plain
> >> >> >> >> >> > instructing pg 2.490 on osd.56 to repair
> >> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair
> >> >> >> >> >> >
> >> >> >> >> >> 

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Andrija Panic
Guys,

I'm Igor's colleague, working a bit on CEPH,  together with Igor.

This is production cluster, and we are becoming more desperate as the time
goes by.

Im not sure if this is appropriate place to seek commercial support, but
anyhow, I do it...

If anyone feels like and have some experience in this particular PG
troubleshooting issues, we are also ready to seek for commercial support to
solve our issue, company or individual, it doesn't matter.


Thanks,
Andrija

On 20 August 2015 at 19:07, Voloshanenko Igor 
wrote:

> Inktank:
>
> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>
> Mail-list:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
>
> 2015-08-20 20:06 GMT+03:00 Samuel Just :
>
>> Which docs?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>>  wrote:
>> > Not yet. I will create.
>> > But according to mail lists and Inktank docs - it's expected behaviour
>> when
>> > cache enable
>> >
>> > 2015-08-20 19:56 GMT+03:00 Samuel Just :
>> >>
>> >> Is there a bug for this in the tracker?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
>> >> > take
>> >> > snapshot - data not proper update in cache layer, and client (ceph)
>> see
>> >> > damaged snap.. As headers requested from cache layer.
>> >> >
>> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> What was the issue?
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Samuel, we turned off cache layer few hours ago...
>> >> >> > I will post ceph.log in few minutes
>> >> >> >
>> >> >> > For snap - we found issue, was connected with cache tier..
>> >> >> >
>> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
>> >> >> >>
>> >> >> >> Ok, you appear to be using a replicated cache tier in front of a
>> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
>> post
>> >> >> >> the
>> >> >> >> ceph.log from before when you started the scrub until after.
>> Also,
>> >> >> >> what command are you using to take snapshots?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >> >> >>  wrote:
>> >> >> >> > Hi Samuel, we try to fix it in trick way.
>> >> >> >> >
>> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
>> affected,
>> >> >> >> > then
>> >> >> >> > query
>> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that
>> we
>> >> >> >> > mount
>> >> >> >> > this
>> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume
>> to
>> >> >> >> > new
>> >> >> >> > one.
>> >> >> >> >
>> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
>> >> >> >> > 35...
>> >> >> >> > We
>> >> >> >> > laos
>> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
>> >> >> >> > still
>> >> >> >> > have
>> >> >> >> > 35 scrub errors...
>> >> >> >> >
>> >> >> >> > ceph osd getmap -o  - attached
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
>> >> >> >> >>
>> >> >> >> >> Is the number of inconsistent objects growing?  Can you attach
>> >> >> >> >> the
>> >> >> >> >> whole ceph.log from the 6 hours before and after the snippet
>> you
>> >> >> >> >> linked above?  Are you using cache/tiering?  Can you attach
>> the
>> >> >> >> >> osdmap
>> >> >> >> >> (ceph osd getmap -o )?
>> >> >> >> >> -Sam
>> >> >> >> >>
>> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >> >> >> >>  wrote:
>> >> >> >> >> > ceph - 0.94.2
>> >> >> >> >> > Its happen during rebalancing
>> >> >> >> >> >
>> >> >> >> >> > I thought too, that some OSD miss copy, but looks like all
>> >> >> >> >> > miss...
>> >> >> >> >> > So any advice in which direction i need to go
>> >> >> >> >> >
>> >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum <
>> gfar...@redhat.com>:
>> >> >> >> >> >>
>> >> >> >> >> >> From a quick peek it looks like some of the OSDs are
>> missing
>> >> >> >> >> >> clones
>> >> >> >> >> >> of
>> >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect
>> the
>> >> >> >> >> >> pg
>> >> >> >> >> >> repair to handle that but if it's not there's probably
>> >> >> >> >> >> something
>> >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this
>> >> >> >> >> >> something
>> >> >> >> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> >> >> >> -Greg
>> >> >> >> >> >>
>> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >> >> >> >>  wrote:
>> >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing
>> (((
>> >> >> >> >> >> > we
>> >> >> >> >> >> > have 2
>> >> >> >> >> >> > pgs in
>> >> >> >> >> >> > inconsistent state...
>> >> >> >> >> >> >
>> >> >> >> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18

Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Nick Fisk




> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Jacek Jarosiewicz
> Sent: 20 August 2015 07:31
> To: Nick Fisk ; ceph-us...@ceph.com
> Subject: Re: [ceph-users] requests are blocked - problem
> 
> On 08/19/2015 03:41 PM, Nick Fisk wrote:
> > Although you may get some benefit from tweaking parameters, I suspect
> you are nearer the performance ceiling for the current implementation of
> the tiering code. Could you post all the variables you set for the tiering
> including target_max_bytes and the dirty/full ratios.
> >
> 
> sure, all the parameters set are like this:
> 
> hit_set_type bloom
> hit_set_count 1
> hit_set_period 3600
> target_max_bytes 65498264640
> target_max_objects 100
> cache_target_full_ratio 0.95
> cache_min_flush_age 600
> cache_min_evict_age 1800
> cache_target_dirty_ratio 0.75

That pretty much looks ok to me, the only thing I can suggest is maybe to lower 
the full_ratio a bit. The full ratio is based on the percentage across the 
whole pool, but the actual eviction occurs at a percentage of a PG level. I 
think this may mean that in certain cases a PG may block whilist is evicts even 
though it appears the pool hasn't reached the full target.

> 
> 
> > Since you are doing maildirs, which will have lots of small files, you might
> also want to try making the object size of the RBD smaller. This will mean 
> less
> data is needed to be shifted on each promotion/flush.
> >
> 
> I'll try that - thanks!
> 
> J
> 
> --
> Jacek Jarosiewicz
> Administrator Systemów Informatycznych
> 
> 
> SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075
> Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy
> Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy
> 42.756.000 zł
> NIP: 957-05-49-503
> Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
> 
> 
> SUPERMEDIA ->   http://www.supermedia.pl
> dostep do internetu - hosting - kolokacja - lacza - telefonia
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Email lgx...@nxtzas.com trying to subscribe to tracker.ceph.com

2015-08-20 Thread Dan Mick
Someone using the email address

lgx...@nxtzas.com

is trying to subscribe to the Ceph Redmine tracker, but neither redmine nor I 
can use that email address; it bounces with 

: Host or domain name not found. Name service error for
name=nxtzas.com type=: Host not found

If this is you, please email me privately and we'll get you fixed up.


-- 
Dan Mick Red Hat, Inc. Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Ah, this is kind of silly.  I think you don't have 37 errors, but 2
errors.  pg 2.490 object
3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
snap 141.  If you look at the objects after that in the log:

2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
[ERR] repair 2.490
68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
[ERR] repair 2.490
ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2

The clone from the second line matches the head object from the
previous line, and they have the same clone id.  I *think* that the
first error is real, and the subsequent ones are just scrub being
dumb.  Same deal with pg 2.c4.  I just opened
http://tracker.ceph.com/issues/12738.

The original problem is that
3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
missing a clone.  Not sure how that happened, my money is on a
cache/tiering evict racing with a snap trim.  If you have any logging
or relevant information from when that happened, you should open a
bug.  The 'snapdir' in the two object names indicates that the head
object has actually been deleted (which makes sense if you moved the
image to a new image and deleted the old one) and is only being kept
around since there are live snapshots.  I suggest you leave the
snapshots for those images alone for the time being -- removing them
might cause the osd to crash trying to clean up the wierd on disk
state.  Other than the leaked space from those two image snapshots and
the annoying spurious scrub errors, I think no actual corruption is
going on though.  I created a tracker ticket for a feature that would
let ceph-objectstore-tool remove the spurious clone from the
head/snapdir metadata.

Am I right that you haven't actually seen any osd crashes or user
visible corruption (except possibly on snapshots of those two images)?
-Sam

On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
 wrote:
> Inktank:
> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>
> Mail-list:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
>
> 2015-08-20 20:06 GMT+03:00 Samuel Just :
>>
>> Which docs?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>>  wrote:
>> > Not yet. I will create.
>> > But according to mail lists and Inktank docs - it's expected behaviour
>> > when
>> > cache enable
>> >
>> > 2015-08-20 19:56 GMT+03:00 Samuel Just :
>> >>
>> >> Is there a bug for this in the tracker?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
>> >> > take
>> >> > snapshot - data not proper update in cache layer, and client (ceph)
>> >> > see
>> >> > damaged snap.. As headers requested from cache layer.
>> >> >
>> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> What was the issue?
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Samuel, we turned off cache layer few hours ago...
>> >> >> > I will post ceph.log in few minutes
>> >> >> >
>> >> >> > For snap - we found issue, was connected with cache tier..
>> >> >> >
>> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
>> >> >> >>
>> >> >> >> Ok, you appear to be using a replicated cache tier in front of a
>> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
>> >> >> >> post
>> >> >> >> the
>> >> >> >> ceph.log from before when you started the scrub until after.
>> >> >> >> Also,
>> >> >> >> what command are you using to take snapshots?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >> >> >>  wrote:
>> >> >> >> > Hi Samuel, we try to fix it in trick way.
>> >> >> >> >
>> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
>> >> >> >> > affected,
>> >> >> >> > then
>> >> >> >> > query
>> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that
>> >> >> >> > we
>> >> >> >> > mount
>> >> >> >> > this
>> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume
>> >> >> >> > to
>> >> >> >> > new
>> >> >> >> > one.
>> >> >> >> >
>> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
>> >> >> >> > 35...
>> >> >> >> > We
>> >> >> >> > laos
>> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
>> >> >> >> > still
>> >> >> >> > have
>> >> >> >> > 35 scrub errors...
>> >> >> >> >
>> >> >> >> > ceph osd getmap -o  - attached
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
>> >> >> >> >>
>> >> >> >> >> Is the number of inconsistent objects growing?  Can y

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
The feature bug for the tool is http://tracker.ceph.com/issues/12740.
-Sam

On Thu, Aug 20, 2015 at 2:52 PM, Samuel Just  wrote:
> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
> errors.  pg 2.490 object
> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
> snap 141.  If you look at the objects after that in the log:
>
> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
> [ERR] repair 2.490
> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
> [ERR] repair 2.490
> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
>
> The clone from the second line matches the head object from the
> previous line, and they have the same clone id.  I *think* that the
> first error is real, and the subsequent ones are just scrub being
> dumb.  Same deal with pg 2.c4.  I just opened
> http://tracker.ceph.com/issues/12738.
>
> The original problem is that
> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
> missing a clone.  Not sure how that happened, my money is on a
> cache/tiering evict racing with a snap trim.  If you have any logging
> or relevant information from when that happened, you should open a
> bug.  The 'snapdir' in the two object names indicates that the head
> object has actually been deleted (which makes sense if you moved the
> image to a new image and deleted the old one) and is only being kept
> around since there are live snapshots.  I suggest you leave the
> snapshots for those images alone for the time being -- removing them
> might cause the osd to crash trying to clean up the wierd on disk
> state.  Other than the leaked space from those two image snapshots and
> the annoying spurious scrub errors, I think no actual corruption is
> going on though.  I created a tracker ticket for a feature that would
> let ceph-objectstore-tool remove the spurious clone from the
> head/snapdir metadata.
>
> Am I right that you haven't actually seen any osd crashes or user
> visible corruption (except possibly on snapshots of those two images)?
> -Sam
>
> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
>  wrote:
>> Inktank:
>> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>>
>> Mail-list:
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
>>
>> 2015-08-20 20:06 GMT+03:00 Samuel Just :
>>>
>>> Which docs?
>>> -Sam
>>>
>>> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>>>  wrote:
>>> > Not yet. I will create.
>>> > But according to mail lists and Inktank docs - it's expected behaviour
>>> > when
>>> > cache enable
>>> >
>>> > 2015-08-20 19:56 GMT+03:00 Samuel Just :
>>> >>
>>> >> Is there a bug for this in the tracker?
>>> >> -Sam
>>> >>
>>> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>>> >>  wrote:
>>> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
>>> >> > take
>>> >> > snapshot - data not proper update in cache layer, and client (ceph)
>>> >> > see
>>> >> > damaged snap.. As headers requested from cache layer.
>>> >> >
>>> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
>>> >> >>
>>> >> >> What was the issue?
>>> >> >> -Sam
>>> >> >>
>>> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>>> >> >>  wrote:
>>> >> >> > Samuel, we turned off cache layer few hours ago...
>>> >> >> > I will post ceph.log in few minutes
>>> >> >> >
>>> >> >> > For snap - we found issue, was connected with cache tier..
>>> >> >> >
>>> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
>>> >> >> >>
>>> >> >> >> Ok, you appear to be using a replicated cache tier in front of a
>>> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
>>> >> >> >> post
>>> >> >> >> the
>>> >> >> >> ceph.log from before when you started the scrub until after.
>>> >> >> >> Also,
>>> >> >> >> what command are you using to take snapshots?
>>> >> >> >> -Sam
>>> >> >> >>
>>> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>>> >> >> >>  wrote:
>>> >> >> >> > Hi Samuel, we try to fix it in trick way.
>>> >> >> >> >
>>> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
>>> >> >> >> > affected,
>>> >> >> >> > then
>>> >> >> >> > query
>>> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that
>>> >> >> >> > we
>>> >> >> >> > mount
>>> >> >> >> > this
>>> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume
>>> >> >> >> > to
>>> >> >> >> > new
>>> >> >> >> > one.
>>> >> >> >> >
>>> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
>>> >> >> >> > 35...
>>> >> >> >> > We
>>> >> >> >> > laos
>>> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
>>> >> >>

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
thank you Sam!
I also noticed this linked errors during scrub...

Now all lools like reasonable!

So we will wait for bug to be closed.

do you need any help on it?

I mean i can help with coding/testing/etc...

2015-08-21 0:52 GMT+03:00 Samuel Just :

> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
> errors.  pg 2.490 object
> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
> snap 141.  If you look at the objects after that in the log:
>
> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
> [ERR] repair 2.490
> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
> [ERR] repair 2.490
> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
>
> The clone from the second line matches the head object from the
> previous line, and they have the same clone id.  I *think* that the
> first error is real, and the subsequent ones are just scrub being
> dumb.  Same deal with pg 2.c4.  I just opened
> http://tracker.ceph.com/issues/12738.
>
> The original problem is that
> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
> missing a clone.  Not sure how that happened, my money is on a
> cache/tiering evict racing with a snap trim.  If you have any logging
> or relevant information from when that happened, you should open a
> bug.  The 'snapdir' in the two object names indicates that the head
> object has actually been deleted (which makes sense if you moved the
> image to a new image and deleted the old one) and is only being kept
> around since there are live snapshots.  I suggest you leave the
> snapshots for those images alone for the time being -- removing them
> might cause the osd to crash trying to clean up the wierd on disk
> state.  Other than the leaked space from those two image snapshots and
> the annoying spurious scrub errors, I think no actual corruption is
> going on though.  I created a tracker ticket for a feature that would
> let ceph-objectstore-tool remove the spurious clone from the
> head/snapdir metadata.
>
> Am I right that you haven't actually seen any osd crashes or user
> visible corruption (except possibly on snapshots of those two images)?
> -Sam
>
> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
>  wrote:
> > Inktank:
> >
> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
> >
> > Mail-list:
> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
> >
> > 2015-08-20 20:06 GMT+03:00 Samuel Just :
> >>
> >> Which docs?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
> >>  wrote:
> >> > Not yet. I will create.
> >> > But according to mail lists and Inktank docs - it's expected behaviour
> >> > when
> >> > cache enable
> >> >
> >> > 2015-08-20 19:56 GMT+03:00 Samuel Just :
> >> >>
> >> >> Is there a bug for this in the tracker?
> >> >> -Sam
> >> >>
> >> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
> >> >>  wrote:
> >> >> > Issue, that in forward mode, fstrim doesn't work proper, and when
> we
> >> >> > take
> >> >> > snapshot - data not proper update in cache layer, and client (ceph)
> >> >> > see
> >> >> > damaged snap.. As headers requested from cache layer.
> >> >> >
> >> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
> >> >> >>
> >> >> >> What was the issue?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
> >> >> >>  wrote:
> >> >> >> > Samuel, we turned off cache layer few hours ago...
> >> >> >> > I will post ceph.log in few minutes
> >> >> >> >
> >> >> >> > For snap - we found issue, was connected with cache tier..
> >> >> >> >
> >> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
> >> >> >> >>
> >> >> >> >> Ok, you appear to be using a replicated cache tier in front of
> a
> >> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
> >> >> >> >> post
> >> >> >> >> the
> >> >> >> >> ceph.log from before when you started the scrub until after.
> >> >> >> >> Also,
> >> >> >> >> what command are you using to take snapshots?
> >> >> >> >> -Sam
> >> >> >> >>
> >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
> >> >> >> >>  wrote:
> >> >> >> >> > Hi Samuel, we try to fix it in trick way.
> >> >> >> >> >
> >> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
> >> >> >> >> > affected,
> >> >> >> >> > then
> >> >> >> >> > query
> >> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after
> that
> >> >> >> >> > we
> >> >> >> >> > mount
> >> >> >> >> > this
> >> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad
> volume
> >> >> >> >> > to
> >> >> >> >> > new
> >> >> >> >> > one.
> >> >> >> >> >
> >> >> >> >> > But

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Actually, now that I think about it, you probably didn't remove the
images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
other images (that's why the scrub errors went down briefly, those
objects -- which were fine -- went away).  You might want to export
and reimport those two images into new images, but leave the old ones
alone until you can clean up the on disk state (image and snapshots)
and clear the scrub errors.  You probably don't want to read the
snapshots for those images either.  Everything else is, I think,
harmless.

The ceph-objectstore-tool feature would probably not be too hard,
actually.  Each head/snapdir image has two attrs (possibly stored in
leveldb -- that's why you want to modify the ceph-objectstore-tool and
use its interfaces rather than mucking about with the files directly)
'_' and 'snapset' which contain encoded representations of
object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
SnapSet has a set of clones and related metadata -- you want to read
the SnapSet attr off disk and commit a transaction writing out a new
version with that clone removed.  I'd start by cloning the repo,
starting a vstart cluster locally, and reproducing the issue.  Next,
get familiar with using ceph-objectstore-tool on the osds in that
vstart cluster.  A good first change would be creating a
ceph-objectstore-tool op that lets you dump json for the object_info_t
and SnapSet (both types have format() methods which make that easy) on
an object to stdout so you can confirm what's actually there.  oftc
#ceph-devel or the ceph-devel mailing list would be the right place to
ask questions.

Otherwise, it'll probably get done in the next few weeks.
-Sam

On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
 wrote:
> thank you Sam!
> I also noticed this linked errors during scrub...
>
> Now all lools like reasonable!
>
> So we will wait for bug to be closed.
>
> do you need any help on it?
>
> I mean i can help with coding/testing/etc...
>
> 2015-08-21 0:52 GMT+03:00 Samuel Just :
>>
>> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
>> errors.  pg 2.490 object
>> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
>> snap 141.  If you look at the objects after that in the log:
>>
>> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
>> [ERR] repair 2.490
>> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
>> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
>> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
>> [ERR] repair 2.490
>> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
>> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
>>
>> The clone from the second line matches the head object from the
>> previous line, and they have the same clone id.  I *think* that the
>> first error is real, and the subsequent ones are just scrub being
>> dumb.  Same deal with pg 2.c4.  I just opened
>> http://tracker.ceph.com/issues/12738.
>>
>> The original problem is that
>> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
>> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
>> missing a clone.  Not sure how that happened, my money is on a
>> cache/tiering evict racing with a snap trim.  If you have any logging
>> or relevant information from when that happened, you should open a
>> bug.  The 'snapdir' in the two object names indicates that the head
>> object has actually been deleted (which makes sense if you moved the
>> image to a new image and deleted the old one) and is only being kept
>> around since there are live snapshots.  I suggest you leave the
>> snapshots for those images alone for the time being -- removing them
>> might cause the osd to crash trying to clean up the wierd on disk
>> state.  Other than the leaked space from those two image snapshots and
>> the annoying spurious scrub errors, I think no actual corruption is
>> going on though.  I created a tracker ticket for a feature that would
>> let ceph-objectstore-tool remove the spurious clone from the
>> head/snapdir metadata.
>>
>> Am I right that you haven't actually seen any osd crashes or user
>> visible corruption (except possibly on snapshots of those two images)?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
>>  wrote:
>> > Inktank:
>> >
>> > https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>> >
>> > Mail-list:
>> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
>> >
>> > 2015-08-20 20:06 GMT+03:00 Samuel Just :
>> >>
>> >> Which docs?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Not yet. I will create.
>> >> > But according to mail lists and Inktank docs - it's expected
>> >> > behaviour
>> >> > when
>> >> > cache enable
>> >> >
>> >> >

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Andrija Panic
This was related to the caching layer, which doesnt support snapshooting
per docs...for sake of closing the thread.

On 17 August 2015 at 21:15, Voloshanenko Igor 
wrote:

> Hi all, can you please help me with unexplained situation...
>
> All snapshot inside ceph broken...
>
> So, as example, we have VM template, as rbd inside ceph.
> We can map it and mount to check that all ok with it
>
> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> /dev/rbd0
> root@test:~# parted /dev/rbd0 print
> Model: Unknown (unknown)
> Disk /dev/rbd0: 10.7GB
> Sector size (logical/physical): 512B/512B
> Partition Table: msdos
>
> Number  Start   End SizeType File system  Flags
>  1  1049kB  525MB   524MB   primary  ext4 boot
>  2  525MB   10.7GB  10.2GB  primary   lvm
>
> Than i want to create snap, so i do:
> root@test:~# rbd snap create
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>
> And now i want to map it:
>
> root@test:~# rbd map
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> /dev/rbd1
> root@test:~# parted /dev/rbd1 print
> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>  /dev/rbd1 has been opened read-only.
> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>  /dev/rbd1 has been opened read-only.
> Error: /dev/rbd1: unrecognised disk label
>
> Even md5 different...
> root@ix-s2:~# md5sum /dev/rbd0
> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> root@ix-s2:~# md5sum /dev/rbd1
> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>
>
> Ok, now i protect snap and create clone... but same thing...
> md5 for clone same as for snap,,
>
> root@test:~# rbd unmap /dev/rbd1
> root@test:~# rbd snap protect
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> root@test:~# rbd clone
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> cold-storage/test-image
> root@test:~# rbd map cold-storage/test-image
> /dev/rbd1
> root@test:~# md5sum /dev/rbd1
> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>
>  but it's broken...
> root@test:~# parted /dev/rbd1 print
> Error: /dev/rbd1: unrecognised disk label
>
>
> =
>
> tech details:
>
> root@test:~# ceph -v
> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>
> We have 2 inconstistent pgs, but all images not placed on this pgs...
>
> root@test:~# ceph health detail
> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> 18 scrub errors
>
> 
>
> root@test:~# ceph osd map cold-storage
> 0e23c701-401d-4465-b9b4-c02939d57bb5
> osdmap e16770 pool 'cold-storage' (2) object
> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
> ([37,15,14], p37) acting ([37,15,14], p37)
> root@test:~# ceph osd map cold-storage
> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
> osdmap e16770 pool 'cold-storage' (2) object
> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) ->
> up ([12,23,17], p12) acting ([12,23,17], p12)
> root@test:~# ceph osd map cold-storage
> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
> osdmap e16770 pool 'cold-storage' (2) object
> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
> (2.2a9) -> up ([12,44,23], p12) acting ([12,44,23], p12)
>
>
> Also we use cache layer, which in current moment - in forward mode...
>
> Can you please help me with this.. As my brain stop to understand what is
> going on...
>
> Thank in advance!
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Sam, i try to understand which rbd contain this chunks.. but no luck. No
rbd images block names started with this...

Actually, now that I think about it, you probably didn't remove the
> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2




2015-08-21 1:36 GMT+03:00 Samuel Just :

> Actually, now that I think about it, you probably didn't remove the
> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
> other images (that's why the scrub errors went down briefly, those
> objects -- which were fine -- went away).  You might want to export
> and reimport those two images into new images, but leave the old ones
> alone until you can clean up the on disk state (image and snapshots)
> and clear the scrub errors.  You probably don't want to read the
> snapshots for those images either.  Everything else is, I think,
> harmless.
>
> The ceph-objectstore-tool feature would probably not be too hard,
> actually.  Each head/snapdir image has two attrs (possibly stored in
> leveldb -- that's why you want to modify the ceph-objectstore-tool and
> use its interfaces rather than mucking about with the files directly)
> '_' and 'snapset' which contain encoded representations of
> object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
> SnapSet has a set of clones and related metadata -- you want to read
> the SnapSet attr off disk and commit a transaction writing out a new
> version with that clone removed.  I'd start by cloning the repo,
> starting a vstart cluster locally, and reproducing the issue.  Next,
> get familiar with using ceph-objectstore-tool on the osds in that
> vstart cluster.  A good first change would be creating a
> ceph-objectstore-tool op that lets you dump json for the object_info_t
> and SnapSet (both types have format() methods which make that easy) on
> an object to stdout so you can confirm what's actually there.  oftc
> #ceph-devel or the ceph-devel mailing list would be the right place to
> ask questions.
>
> Otherwise, it'll probably get done in the next few weeks.
> -Sam
>
> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
>  wrote:
> > thank you Sam!
> > I also noticed this linked errors during scrub...
> >
> > Now all lools like reasonable!
> >
> > So we will wait for bug to be closed.
> >
> > do you need any help on it?
> >
> > I mean i can help with coding/testing/etc...
> >
> > 2015-08-21 0:52 GMT+03:00 Samuel Just :
> >>
> >> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
> >> errors.  pg 2.490 object
> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
> >> snap 141.  If you look at the objects after that in the log:
> >>
> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
> >> [ERR] repair 2.490
> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
> >> [ERR] repair 2.490
> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
> >>
> >> The clone from the second line matches the head object from the
> >> previous line, and they have the same clone id.  I *think* that the
> >> first error is real, and the subsequent ones are just scrub being
> >> dumb.  Same deal with pg 2.c4.  I just opened
> >> http://tracker.ceph.com/issues/12738.
> >>
> >> The original problem is that
> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
> >> missing a clone.  Not sure how that happened, my money is on a
> >> cache/tiering evict racing with a snap trim.  If you have any logging
> >> or relevant information from when that happened, you should open a
> >> bug.  The 'snapdir' in the two object names indicates that the head
> >> object has actually been deleted (which makes sense if you moved the
> >> image to a new image and deleted the old one) and is only being kept
> >> around since there are live snapshots.  I suggest you leave the
> >> snapshots for those images alone for the time being -- removing them
> >> might cause the osd to crash trying to clean up the wierd on disk
> >> state.  Other than the leaked space from those two image snapshots and
> >> the annoying spurious scrub errors, I think no actual corruption is
> >> going on though.  I created a tracker ticket for a feature that would
> >> let ceph-objectstore-tool remove the spurious clone from the
> >> head/snapdir metadata.
> >>
> >> Am I right that you haven't actually seen any osd crashes or user
> >> visible corruption (except possibly on snapshots of those two images)?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
> >> 

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Interesting.  How often do you delete an image?  I'm wondering if
whatever this is happened when you deleted these two images.
-Sam

On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor
 wrote:
> Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd
> images block names started with this...
>
>> Actually, now that I think about it, you probably didn't remove the
>> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
>> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2
>
>
>
>
> 2015-08-21 1:36 GMT+03:00 Samuel Just :
>>
>> Actually, now that I think about it, you probably didn't remove the
>> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
>> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
>> other images (that's why the scrub errors went down briefly, those
>> objects -- which were fine -- went away).  You might want to export
>> and reimport those two images into new images, but leave the old ones
>> alone until you can clean up the on disk state (image and snapshots)
>> and clear the scrub errors.  You probably don't want to read the
>> snapshots for those images either.  Everything else is, I think,
>> harmless.
>>
>> The ceph-objectstore-tool feature would probably not be too hard,
>> actually.  Each head/snapdir image has two attrs (possibly stored in
>> leveldb -- that's why you want to modify the ceph-objectstore-tool and
>> use its interfaces rather than mucking about with the files directly)
>> '_' and 'snapset' which contain encoded representations of
>> object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
>> SnapSet has a set of clones and related metadata -- you want to read
>> the SnapSet attr off disk and commit a transaction writing out a new
>> version with that clone removed.  I'd start by cloning the repo,
>> starting a vstart cluster locally, and reproducing the issue.  Next,
>> get familiar with using ceph-objectstore-tool on the osds in that
>> vstart cluster.  A good first change would be creating a
>> ceph-objectstore-tool op that lets you dump json for the object_info_t
>> and SnapSet (both types have format() methods which make that easy) on
>> an object to stdout so you can confirm what's actually there.  oftc
>> #ceph-devel or the ceph-devel mailing list would be the right place to
>> ask questions.
>>
>> Otherwise, it'll probably get done in the next few weeks.
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
>>  wrote:
>> > thank you Sam!
>> > I also noticed this linked errors during scrub...
>> >
>> > Now all lools like reasonable!
>> >
>> > So we will wait for bug to be closed.
>> >
>> > do you need any help on it?
>> >
>> > I mean i can help with coding/testing/etc...
>> >
>> > 2015-08-21 0:52 GMT+03:00 Samuel Just :
>> >>
>> >> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
>> >> errors.  pg 2.490 object
>> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
>> >> snap 141.  If you look at the objects after that in the log:
>> >>
>> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
>> >> [ERR] repair 2.490
>> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
>> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
>> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
>> >> [ERR] repair 2.490
>> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
>> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
>> >>
>> >> The clone from the second line matches the head object from the
>> >> previous line, and they have the same clone id.  I *think* that the
>> >> first error is real, and the subsequent ones are just scrub being
>> >> dumb.  Same deal with pg 2.c4.  I just opened
>> >> http://tracker.ceph.com/issues/12738.
>> >>
>> >> The original problem is that
>> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
>> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
>> >> missing a clone.  Not sure how that happened, my money is on a
>> >> cache/tiering evict racing with a snap trim.  If you have any logging
>> >> or relevant information from when that happened, you should open a
>> >> bug.  The 'snapdir' in the two object names indicates that the head
>> >> object has actually been deleted (which makes sense if you moved the
>> >> image to a new image and deleted the old one) and is only being kept
>> >> around since there are live snapshots.  I suggest you leave the
>> >> snapshots for those images alone for the time being -- removing them
>> >> might cause the osd to crash trying to clean up the wierd on disk
>> >> state.  Other than the leaked space from those two image snapshots and
>> >> the annoying spurious scrub errors, I think no actual corruption is
>> >> going on though.  I created a tracker ticket for a feature that would
>> >> let ceph-objectstore-tool

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Image? One?

We start deleting images only to fix thsi (export/import)m before - 1-4
times per day (when VM destroyed)...



2015-08-21 1:44 GMT+03:00 Samuel Just :

> Interesting.  How often do you delete an image?  I'm wondering if
> whatever this is happened when you deleted these two images.
> -Sam
>
> On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor
>  wrote:
> > Sam, i try to understand which rbd contain this chunks.. but no luck. No
> rbd
> > images block names started with this...
> >
> >> Actually, now that I think about it, you probably didn't remove the
> >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
> >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2
> >
> >
> >
> >
> > 2015-08-21 1:36 GMT+03:00 Samuel Just :
> >>
> >> Actually, now that I think about it, you probably didn't remove the
> >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
> >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
> >> other images (that's why the scrub errors went down briefly, those
> >> objects -- which were fine -- went away).  You might want to export
> >> and reimport those two images into new images, but leave the old ones
> >> alone until you can clean up the on disk state (image and snapshots)
> >> and clear the scrub errors.  You probably don't want to read the
> >> snapshots for those images either.  Everything else is, I think,
> >> harmless.
> >>
> >> The ceph-objectstore-tool feature would probably not be too hard,
> >> actually.  Each head/snapdir image has two attrs (possibly stored in
> >> leveldb -- that's why you want to modify the ceph-objectstore-tool and
> >> use its interfaces rather than mucking about with the files directly)
> >> '_' and 'snapset' which contain encoded representations of
> >> object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
> >> SnapSet has a set of clones and related metadata -- you want to read
> >> the SnapSet attr off disk and commit a transaction writing out a new
> >> version with that clone removed.  I'd start by cloning the repo,
> >> starting a vstart cluster locally, and reproducing the issue.  Next,
> >> get familiar with using ceph-objectstore-tool on the osds in that
> >> vstart cluster.  A good first change would be creating a
> >> ceph-objectstore-tool op that lets you dump json for the object_info_t
> >> and SnapSet (both types have format() methods which make that easy) on
> >> an object to stdout so you can confirm what's actually there.  oftc
> >> #ceph-devel or the ceph-devel mailing list would be the right place to
> >> ask questions.
> >>
> >> Otherwise, it'll probably get done in the next few weeks.
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
> >>  wrote:
> >> > thank you Sam!
> >> > I also noticed this linked errors during scrub...
> >> >
> >> > Now all lools like reasonable!
> >> >
> >> > So we will wait for bug to be closed.
> >> >
> >> > do you need any help on it?
> >> >
> >> > I mean i can help with coding/testing/etc...
> >> >
> >> > 2015-08-21 0:52 GMT+03:00 Samuel Just :
> >> >>
> >> >> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
> >> >> errors.  pg 2.490 object
> >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is
> missing
> >> >> snap 141.  If you look at the objects after that in the log:
> >> >>
> >> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 :
> cluster
> >> >> [ERR] repair 2.490
> >> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
> >> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
> >> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 :
> cluster
> >> >> [ERR] repair 2.490
> >> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
> >> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
> >> >>
> >> >> The clone from the second line matches the head object from the
> >> >> previous line, and they have the same clone id.  I *think* that the
> >> >> first error is real, and the subsequent ones are just scrub being
> >> >> dumb.  Same deal with pg 2.c4.  I just opened
> >> >> http://tracker.ceph.com/issues/12738.
> >> >>
> >> >> The original problem is that
> >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
> >> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
> >> >> missing a clone.  Not sure how that happened, my money is on a
> >> >> cache/tiering evict racing with a snap trim.  If you have any logging
> >> >> or relevant information from when that happened, you should open a
> >> >> bug.  The 'snapdir' in the two object names indicates that the head
> >> >> object has actually been deleted (which makes sense if you moved the
> >> >> image to a new image and deleted the old one) and is only being kept
> >> >> around since there are live snapshots.  I suggest you leave the
> >> >> snapshots for those images alone f

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Ok, so images are regularly removed.  In that case, these two objects
probably are left over from previously removed images.  Once
ceph-objectstore-tool can dump the SnapSet from those two objects, you
will probably find that those two snapdir objects each have only one
bogus clone, in which case you'll probably just remove the images.
-Sam

On Thu, Aug 20, 2015 at 3:45 PM, Voloshanenko Igor
 wrote:
> Image? One?
>
> We start deleting images only to fix thsi (export/import)m before - 1-4
> times per day (when VM destroyed)...
>
>
>
> 2015-08-21 1:44 GMT+03:00 Samuel Just :
>>
>> Interesting.  How often do you delete an image?  I'm wondering if
>> whatever this is happened when you deleted these two images.
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor
>>  wrote:
>> > Sam, i try to understand which rbd contain this chunks.. but no luck. No
>> > rbd
>> > images block names started with this...
>> >
>> >> Actually, now that I think about it, you probably didn't remove the
>> >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
>> >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2
>> >
>> >
>> >
>> >
>> > 2015-08-21 1:36 GMT+03:00 Samuel Just :
>> >>
>> >> Actually, now that I think about it, you probably didn't remove the
>> >> images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
>> >> and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
>> >> other images (that's why the scrub errors went down briefly, those
>> >> objects -- which were fine -- went away).  You might want to export
>> >> and reimport those two images into new images, but leave the old ones
>> >> alone until you can clean up the on disk state (image and snapshots)
>> >> and clear the scrub errors.  You probably don't want to read the
>> >> snapshots for those images either.  Everything else is, I think,
>> >> harmless.
>> >>
>> >> The ceph-objectstore-tool feature would probably not be too hard,
>> >> actually.  Each head/snapdir image has two attrs (possibly stored in
>> >> leveldb -- that's why you want to modify the ceph-objectstore-tool and
>> >> use its interfaces rather than mucking about with the files directly)
>> >> '_' and 'snapset' which contain encoded representations of
>> >> object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
>> >> SnapSet has a set of clones and related metadata -- you want to read
>> >> the SnapSet attr off disk and commit a transaction writing out a new
>> >> version with that clone removed.  I'd start by cloning the repo,
>> >> starting a vstart cluster locally, and reproducing the issue.  Next,
>> >> get familiar with using ceph-objectstore-tool on the osds in that
>> >> vstart cluster.  A good first change would be creating a
>> >> ceph-objectstore-tool op that lets you dump json for the object_info_t
>> >> and SnapSet (both types have format() methods which make that easy) on
>> >> an object to stdout so you can confirm what's actually there.  oftc
>> >> #ceph-devel or the ceph-devel mailing list would be the right place to
>> >> ask questions.
>> >>
>> >> Otherwise, it'll probably get done in the next few weeks.
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
>> >>  wrote:
>> >> > thank you Sam!
>> >> > I also noticed this linked errors during scrub...
>> >> >
>> >> > Now all lools like reasonable!
>> >> >
>> >> > So we will wait for bug to be closed.
>> >> >
>> >> > do you need any help on it?
>> >> >
>> >> > I mean i can help with coding/testing/etc...
>> >> >
>> >> > 2015-08-21 0:52 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
>> >> >> errors.  pg 2.490 object
>> >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is
>> >> >> missing
>> >> >> snap 141.  If you look at the objects after that in the log:
>> >> >>
>> >> >> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 :
>> >> >> cluster
>> >> >> [ERR] repair 2.490
>> >> >> 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
>> >> >> clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
>> >> >> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 :
>> >> >> cluster
>> >> >> [ERR] repair 2.490
>> >> >> ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
>> >> >> clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
>> >> >>
>> >> >> The clone from the second line matches the head object from the
>> >> >> previous line, and they have the same clone id.  I *think* that the
>> >> >> first error is real, and the subsequent ones are just scrub being
>> >> >> dumb.  Same deal with pg 2.c4.  I just opened
>> >> >> http://tracker.ceph.com/issues/12738.
>> >> >>
>> >> >> The original problem is that
>> >> >> 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
>> >> >> 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
>> >> >> missing a clone.  Not sure how

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Snapshotting with cache/tiering *is* supposed to work.  Can you open a bug?
-Sam

On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic  wrote:
> This was related to the caching layer, which doesnt support snapshooting per
> docs...for sake of closing the thread.
>
> On 17 August 2015 at 21:15, Voloshanenko Igor 
> wrote:
>>
>> Hi all, can you please help me with unexplained situation...
>>
>> All snapshot inside ceph broken...
>>
>> So, as example, we have VM template, as rbd inside ceph.
>> We can map it and mount to check that all ok with it
>>
>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>> /dev/rbd0
>> root@test:~# parted /dev/rbd0 print
>> Model: Unknown (unknown)
>> Disk /dev/rbd0: 10.7GB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: msdos
>>
>> Number  Start   End SizeType File system  Flags
>>  1  1049kB  525MB   524MB   primary  ext4 boot
>>  2  525MB   10.7GB  10.2GB  primary   lvm
>>
>> Than i want to create snap, so i do:
>> root@test:~# rbd snap create
>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>
>> And now i want to map it:
>>
>> root@test:~# rbd map
>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> /dev/rbd1
>> root@test:~# parted /dev/rbd1 print
>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>> /dev/rbd1 has been opened read-only.
>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>> /dev/rbd1 has been opened read-only.
>> Error: /dev/rbd1: unrecognised disk label
>>
>> Even md5 different...
>> root@ix-s2:~# md5sum /dev/rbd0
>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>> root@ix-s2:~# md5sum /dev/rbd1
>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>
>>
>> Ok, now i protect snap and create clone... but same thing...
>> md5 for clone same as for snap,,
>>
>> root@test:~# rbd unmap /dev/rbd1
>> root@test:~# rbd snap protect
>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> root@test:~# rbd clone
>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> cold-storage/test-image
>> root@test:~# rbd map cold-storage/test-image
>> /dev/rbd1
>> root@test:~# md5sum /dev/rbd1
>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>
>>  but it's broken...
>> root@test:~# parted /dev/rbd1 print
>> Error: /dev/rbd1: unrecognised disk label
>>
>>
>> =
>>
>> tech details:
>>
>> root@test:~# ceph -v
>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>>
>> We have 2 inconstistent pgs, but all images not placed on this pgs...
>>
>> root@test:~# ceph health detail
>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> 18 scrub errors
>>
>> 
>>
>> root@test:~# ceph osd map cold-storage
>> 0e23c701-401d-4465-b9b4-c02939d57bb5
>> osdmap e16770 pool 'cold-storage' (2) object
>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
>> ([37,15,14], p37) acting ([37,15,14], p37)
>> root@test:~# ceph osd map cold-storage
>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
>> osdmap e16770 pool 'cold-storage' (2) object
>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) -> up
>> ([12,23,17], p12) acting ([12,23,17], p12)
>> root@test:~# ceph osd map cold-storage
>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
>> osdmap e16770 pool 'cold-storage' (2) object
>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 (2.2a9)
>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
>>
>>
>> Also we use cache layer, which in current moment - in forward mode...
>>
>> Can you please help me with this.. As my brain stop to understand what is
>> going on...
>>
>> Thank in advance!
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Also, can you include the kernel version?
-Sam

On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
> Snapshotting with cache/tiering *is* supposed to work.  Can you open a bug?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic  
> wrote:
>> This was related to the caching layer, which doesnt support snapshooting per
>> docs...for sake of closing the thread.
>>
>> On 17 August 2015 at 21:15, Voloshanenko Igor 
>> wrote:
>>>
>>> Hi all, can you please help me with unexplained situation...
>>>
>>> All snapshot inside ceph broken...
>>>
>>> So, as example, we have VM template, as rbd inside ceph.
>>> We can map it and mount to check that all ok with it
>>>
>>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>>> /dev/rbd0
>>> root@test:~# parted /dev/rbd0 print
>>> Model: Unknown (unknown)
>>> Disk /dev/rbd0: 10.7GB
>>> Sector size (logical/physical): 512B/512B
>>> Partition Table: msdos
>>>
>>> Number  Start   End SizeType File system  Flags
>>>  1  1049kB  525MB   524MB   primary  ext4 boot
>>>  2  525MB   10.7GB  10.2GB  primary   lvm
>>>
>>> Than i want to create snap, so i do:
>>> root@test:~# rbd snap create
>>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>>
>>> And now i want to map it:
>>>
>>> root@test:~# rbd map
>>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> /dev/rbd1
>>> root@test:~# parted /dev/rbd1 print
>>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>>> /dev/rbd1 has been opened read-only.
>>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>>> /dev/rbd1 has been opened read-only.
>>> Error: /dev/rbd1: unrecognised disk label
>>>
>>> Even md5 different...
>>> root@ix-s2:~# md5sum /dev/rbd0
>>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>>> root@ix-s2:~# md5sum /dev/rbd1
>>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>>
>>>
>>> Ok, now i protect snap and create clone... but same thing...
>>> md5 for clone same as for snap,,
>>>
>>> root@test:~# rbd unmap /dev/rbd1
>>> root@test:~# rbd snap protect
>>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> root@test:~# rbd clone
>>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> cold-storage/test-image
>>> root@test:~# rbd map cold-storage/test-image
>>> /dev/rbd1
>>> root@test:~# md5sum /dev/rbd1
>>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>>
>>>  but it's broken...
>>> root@test:~# parted /dev/rbd1 print
>>> Error: /dev/rbd1: unrecognised disk label
>>>
>>>
>>> =
>>>
>>> tech details:
>>>
>>> root@test:~# ceph -v
>>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>>>
>>> We have 2 inconstistent pgs, but all images not placed on this pgs...
>>>
>>> root@test:~# ceph health detail
>>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>>> 18 scrub errors
>>>
>>> 
>>>
>>> root@test:~# ceph osd map cold-storage
>>> 0e23c701-401d-4465-b9b4-c02939d57bb5
>>> osdmap e16770 pool 'cold-storage' (2) object
>>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
>>> ([37,15,14], p37) acting ([37,15,14], p37)
>>> root@test:~# ceph osd map cold-storage
>>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
>>> osdmap e16770 pool 'cold-storage' (2) object
>>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) -> up
>>> ([12,23,17], p12) acting ([12,23,17], p12)
>>> root@test:~# ceph osd map cold-storage
>>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
>>> osdmap e16770 pool 'cold-storage' (2) object
>>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9 (2.2a9)
>>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
>>>
>>>
>>> Also we use cache layer, which in current moment - in forward mode...
>>>
>>> Can you please help me with this.. As my brain stop to understand what is
>>> going on...
>>>
>>> Thank in advance!
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Yes, will do.

What we see. When cache tier in forward mod, if i did
rbd snap create - it's use rbd_header not from cold tier, but from
hot-tier, butm this 2 headers not synced
And can;t be evicted from hot-storage, as it;s locked by KVM (Qemu). If i
kill lock, evict header - all start to work..
But it's unacceptable for production... To kill lock during running VM (((

2015-08-21 1:51 GMT+03:00 Samuel Just :

> Snapshotting with cache/tiering *is* supposed to work.  Can you open a bug?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic 
> wrote:
> > This was related to the caching layer, which doesnt support snapshooting
> per
> > docs...for sake of closing the thread.
> >
> > On 17 August 2015 at 21:15, Voloshanenko Igor <
> igor.voloshane...@gmail.com>
> > wrote:
> >>
> >> Hi all, can you please help me with unexplained situation...
> >>
> >> All snapshot inside ceph broken...
> >>
> >> So, as example, we have VM template, as rbd inside ceph.
> >> We can map it and mount to check that all ok with it
> >>
> >> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >> /dev/rbd0
> >> root@test:~# parted /dev/rbd0 print
> >> Model: Unknown (unknown)
> >> Disk /dev/rbd0: 10.7GB
> >> Sector size (logical/physical): 512B/512B
> >> Partition Table: msdos
> >>
> >> Number  Start   End SizeType File system  Flags
> >>  1  1049kB  525MB   524MB   primary  ext4 boot
> >>  2  525MB   10.7GB  10.2GB  primary   lvm
> >>
> >> Than i want to create snap, so i do:
> >> root@test:~# rbd snap create
> >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>
> >> And now i want to map it:
> >>
> >> root@test:~# rbd map
> >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> /dev/rbd1
> >> root@test:~# parted /dev/rbd1 print
> >> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
> >> /dev/rbd1 has been opened read-only.
> >> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
> >> /dev/rbd1 has been opened read-only.
> >> Error: /dev/rbd1: unrecognised disk label
> >>
> >> Even md5 different...
> >> root@ix-s2:~# md5sum /dev/rbd0
> >> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> >> root@ix-s2:~# md5sum /dev/rbd1
> >> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>
> >>
> >> Ok, now i protect snap and create clone... but same thing...
> >> md5 for clone same as for snap,,
> >>
> >> root@test:~# rbd unmap /dev/rbd1
> >> root@test:~# rbd snap protect
> >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> root@test:~# rbd clone
> >> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> cold-storage/test-image
> >> root@test:~# rbd map cold-storage/test-image
> >> /dev/rbd1
> >> root@test:~# md5sum /dev/rbd1
> >> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>
> >>  but it's broken...
> >> root@test:~# parted /dev/rbd1 print
> >> Error: /dev/rbd1: unrecognised disk label
> >>
> >>
> >> =
> >>
> >> tech details:
> >>
> >> root@test:~# ceph -v
> >> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
> >>
> >> We have 2 inconstistent pgs, but all images not placed on this pgs...
> >>
> >> root@test:~# ceph health detail
> >> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> 18 scrub errors
> >>
> >> 
> >>
> >> root@test:~# ceph osd map cold-storage
> >> 0e23c701-401d-4465-b9b4-c02939d57bb5
> >> osdmap e16770 pool 'cold-storage' (2) object
> >> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
> >> ([37,15,14], p37) acting ([37,15,14], p37)
> >> root@test:~# ceph osd map cold-storage
> >> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
> >> osdmap e16770 pool 'cold-storage' (2) object
> >> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3)
> -> up
> >> ([12,23,17], p12) acting ([12,23,17], p12)
> >> root@test:~# ceph osd map cold-storage
> >> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
> >> osdmap e16770 pool 'cold-storage' (2) object
> >> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
> (2.2a9)
> >> -> up ([12,44,23], p12) acting ([12,44,23], p12)
> >>
> >>
> >> Also we use cache layer, which in current moment - in forward mode...
> >>
> >> Can you please help me with this.. As my brain stop to understand what
> is
> >> going on...
> >>
> >> Thank in advance!
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> >
> > --
> >
> > Andrija Panić
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
root@test:~# uname -a
Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

2015-08-21 1:54 GMT+03:00 Samuel Just :

> Also, can you include the kernel version?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
> > Snapshotting with cache/tiering *is* supposed to work.  Can you open a
> bug?
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic 
> wrote:
> >> This was related to the caching layer, which doesnt support
> snapshooting per
> >> docs...for sake of closing the thread.
> >>
> >> On 17 August 2015 at 21:15, Voloshanenko Igor <
> igor.voloshane...@gmail.com>
> >> wrote:
> >>>
> >>> Hi all, can you please help me with unexplained situation...
> >>>
> >>> All snapshot inside ceph broken...
> >>>
> >>> So, as example, we have VM template, as rbd inside ceph.
> >>> We can map it and mount to check that all ok with it
> >>>
> >>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >>> /dev/rbd0
> >>> root@test:~# parted /dev/rbd0 print
> >>> Model: Unknown (unknown)
> >>> Disk /dev/rbd0: 10.7GB
> >>> Sector size (logical/physical): 512B/512B
> >>> Partition Table: msdos
> >>>
> >>> Number  Start   End SizeType File system  Flags
> >>>  1  1049kB  525MB   524MB   primary  ext4 boot
> >>>  2  525MB   10.7GB  10.2GB  primary   lvm
> >>>
> >>> Than i want to create snap, so i do:
> >>> root@test:~# rbd snap create
> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>>
> >>> And now i want to map it:
> >>>
> >>> root@test:~# rbd map
> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> /dev/rbd1
> >>> root@test:~# parted /dev/rbd1 print
> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
> >>> /dev/rbd1 has been opened read-only.
> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
> >>> /dev/rbd1 has been opened read-only.
> >>> Error: /dev/rbd1: unrecognised disk label
> >>>
> >>> Even md5 different...
> >>> root@ix-s2:~# md5sum /dev/rbd0
> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> >>> root@ix-s2:~# md5sum /dev/rbd1
> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>>
> >>>
> >>> Ok, now i protect snap and create clone... but same thing...
> >>> md5 for clone same as for snap,,
> >>>
> >>> root@test:~# rbd unmap /dev/rbd1
> >>> root@test:~# rbd snap protect
> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> root@test:~# rbd clone
> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> cold-storage/test-image
> >>> root@test:~# rbd map cold-storage/test-image
> >>> /dev/rbd1
> >>> root@test:~# md5sum /dev/rbd1
> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>>
> >>>  but it's broken...
> >>> root@test:~# parted /dev/rbd1 print
> >>> Error: /dev/rbd1: unrecognised disk label
> >>>
> >>>
> >>> =
> >>>
> >>> tech details:
> >>>
> >>> root@test:~# ceph -v
> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
> >>>
> >>> We have 2 inconstistent pgs, but all images not placed on this pgs...
> >>>
> >>> root@test:~# ceph health detail
> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >>> 18 scrub errors
> >>>
> >>> 
> >>>
> >>> root@test:~# ceph osd map cold-storage
> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
> >>> osdmap e16770 pool 'cold-storage' (2) object
> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
> >>> ([37,15,14], p37) acting ([37,15,14], p37)
> >>> root@test:~# ceph osd map cold-storage
> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
> >>> osdmap e16770 pool 'cold-storage' (2) object
> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3)
> -> up
> >>> ([12,23,17], p12) acting ([12,23,17], p12)
> >>> root@test:~# ceph osd map cold-storage
> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
> >>> osdmap e16770 pool 'cold-storage' (2) object
> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
> (2.2a9)
> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
> >>>
> >>>
> >>> Also we use cache layer, which in current moment - in forward mode...
> >>>
> >>> Can you please help me with this.. As my brain stop to understand what
> is
> >>> going on...
> >>>
> >>> Thank in advance!
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Andrija Panić
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
>
___
ceph-users mailing list
ceph-users@lists.c

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Hmm, that might actually be client side.  Can you attempt to reproduce
with rbd-fuse (different client side implementation from the kernel)?
-Sam

On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor
 wrote:
> root@test:~# uname -a
> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
> 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>>
>> Also, can you include the kernel version?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
>> > Snapshotting with cache/tiering *is* supposed to work.  Can you open a
>> > bug?
>> > -Sam
>> >
>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic 
>> > wrote:
>> >> This was related to the caching layer, which doesnt support
>> >> snapshooting per
>> >> docs...for sake of closing the thread.
>> >>
>> >> On 17 August 2015 at 21:15, Voloshanenko Igor
>> >> 
>> >> wrote:
>> >>>
>> >>> Hi all, can you please help me with unexplained situation...
>> >>>
>> >>> All snapshot inside ceph broken...
>> >>>
>> >>> So, as example, we have VM template, as rbd inside ceph.
>> >>> We can map it and mount to check that all ok with it
>> >>>
>> >>> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>> >>> /dev/rbd0
>> >>> root@test:~# parted /dev/rbd0 print
>> >>> Model: Unknown (unknown)
>> >>> Disk /dev/rbd0: 10.7GB
>> >>> Sector size (logical/physical): 512B/512B
>> >>> Partition Table: msdos
>> >>>
>> >>> Number  Start   End SizeType File system  Flags
>> >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>> >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>> >>>
>> >>> Than i want to create snap, so i do:
>> >>> root@test:~# rbd snap create
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>>
>> >>> And now i want to map it:
>> >>>
>> >>> root@test:~# rbd map
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>> /dev/rbd1
>> >>> root@test:~# parted /dev/rbd1 print
>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>> >>> /dev/rbd1 has been opened read-only.
>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>> >>> /dev/rbd1 has been opened read-only.
>> >>> Error: /dev/rbd1: unrecognised disk label
>> >>>
>> >>> Even md5 different...
>> >>> root@ix-s2:~# md5sum /dev/rbd0
>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>> >>> root@ix-s2:~# md5sum /dev/rbd1
>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>> >>>
>> >>>
>> >>> Ok, now i protect snap and create clone... but same thing...
>> >>> md5 for clone same as for snap,,
>> >>>
>> >>> root@test:~# rbd unmap /dev/rbd1
>> >>> root@test:~# rbd snap protect
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>> root@test:~# rbd clone
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>> cold-storage/test-image
>> >>> root@test:~# rbd map cold-storage/test-image
>> >>> /dev/rbd1
>> >>> root@test:~# md5sum /dev/rbd1
>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>> >>>
>> >>>  but it's broken...
>> >>> root@test:~# parted /dev/rbd1 print
>> >>> Error: /dev/rbd1: unrecognised disk label
>> >>>
>> >>>
>> >>> =
>> >>>
>> >>> tech details:
>> >>>
>> >>> root@test:~# ceph -v
>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>> >>>
>> >>> We have 2 inconstistent pgs, but all images not placed on this pgs...
>> >>>
>> >>> root@test:~# ceph health detail
>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >>> 18 scrub errors
>> >>>
>> >>> 
>> >>>
>> >>> root@test:~# ceph osd map cold-storage
>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
>> >>> osdmap e16770 pool 'cold-storage' (2) object
>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
>> >>> ([37,15,14], p37) acting ([37,15,14], p37)
>> >>> root@test:~# ceph osd map cold-storage
>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
>> >>> osdmap e16770 pool 'cold-storage' (2) object
>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3)
>> >>> -> up
>> >>> ([12,23,17], p12) acting ([12,23,17], p12)
>> >>> root@test:~# ceph osd map cold-storage
>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
>> >>> osdmap e16770 pool 'cold-storage' (2) object
>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
>> >>> (2.2a9)
>> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
>> >>>
>> >>>
>> >>> Also we use cache layer, which in current moment - in forward mode...
>> >>>
>> >>> Can you please help me with this.. As my brain stop to understand what
>> >>> is
>> >>> going on...
>> >>>
>> >>> Thank in advance!
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ___
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-use

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
We used 4.x branch, as we have "very good" Samsung 850 pro in production,
and they don;t support ncq_trim...

And 4,x first branch which include exceptions for this in libsata.c.

sure we can backport this 1 line to 3.x branch, but we prefer no to go
deeper if packege for new kernel exist.

2015-08-21 1:56 GMT+03:00 Voloshanenko Igor :

> root@test:~# uname -a
> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
> 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>
>> Also, can you include the kernel version?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
>> > Snapshotting with cache/tiering *is* supposed to work.  Can you open a
>> bug?
>> > -Sam
>> >
>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic 
>> wrote:
>> >> This was related to the caching layer, which doesnt support
>> snapshooting per
>> >> docs...for sake of closing the thread.
>> >>
>> >> On 17 August 2015 at 21:15, Voloshanenko Igor <
>> igor.voloshane...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi all, can you please help me with unexplained situation...
>> >>>
>> >>> All snapshot inside ceph broken...
>> >>>
>> >>> So, as example, we have VM template, as rbd inside ceph.
>> >>> We can map it and mount to check that all ok with it
>> >>>
>> >>> root@test:~# rbd map
>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>> >>> /dev/rbd0
>> >>> root@test:~# parted /dev/rbd0 print
>> >>> Model: Unknown (unknown)
>> >>> Disk /dev/rbd0: 10.7GB
>> >>> Sector size (logical/physical): 512B/512B
>> >>> Partition Table: msdos
>> >>>
>> >>> Number  Start   End SizeType File system  Flags
>> >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>> >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>> >>>
>> >>> Than i want to create snap, so i do:
>> >>> root@test:~# rbd snap create
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>>
>> >>> And now i want to map it:
>> >>>
>> >>> root@test:~# rbd map
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>> /dev/rbd1
>> >>> root@test:~# parted /dev/rbd1 print
>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>> >>> /dev/rbd1 has been opened read-only.
>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>> >>> /dev/rbd1 has been opened read-only.
>> >>> Error: /dev/rbd1: unrecognised disk label
>> >>>
>> >>> Even md5 different...
>> >>> root@ix-s2:~# md5sum /dev/rbd0
>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>> >>> root@ix-s2:~# md5sum /dev/rbd1
>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>> >>>
>> >>>
>> >>> Ok, now i protect snap and create clone... but same thing...
>> >>> md5 for clone same as for snap,,
>> >>>
>> >>> root@test:~# rbd unmap /dev/rbd1
>> >>> root@test:~# rbd snap protect
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>> root@test:~# rbd clone
>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>> cold-storage/test-image
>> >>> root@test:~# rbd map cold-storage/test-image
>> >>> /dev/rbd1
>> >>> root@test:~# md5sum /dev/rbd1
>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>> >>>
>> >>>  but it's broken...
>> >>> root@test:~# parted /dev/rbd1 print
>> >>> Error: /dev/rbd1: unrecognised disk label
>> >>>
>> >>>
>> >>> =
>> >>>
>> >>> tech details:
>> >>>
>> >>> root@test:~# ceph -v
>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>> >>>
>> >>> We have 2 inconstistent pgs, but all images not placed on this pgs...
>> >>>
>> >>> root@test:~# ceph health detail
>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >>> 18 scrub errors
>> >>>
>> >>> 
>> >>>
>> >>> root@test:~# ceph osd map cold-storage
>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
>> >>> osdmap e16770 pool 'cold-storage' (2) object
>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
>> >>> ([37,15,14], p37) acting ([37,15,14], p37)
>> >>> root@test:~# ceph osd map cold-storage
>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
>> >>> osdmap e16770 pool 'cold-storage' (2) object
>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3)
>> -> up
>> >>> ([12,23,17], p12) acting ([12,23,17], p12)
>> >>> root@test:~# ceph osd map cold-storage
>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
>> >>> osdmap e16770 pool 'cold-storage' (2) object
>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
>> (2.2a9)
>> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
>> >>>
>> >>>
>> >>> Also we use cache layer, which in current moment - in forward mode...
>> >>>
>> >>> Can you please help me with this.. As my brain stop to understand
>> what is
>> >>> going on...
>> >>>
>> >>> Thank in advance!
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ___

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
I already kill cache layer, but will try to reproduce on lab

2015-08-21 1:58 GMT+03:00 Samuel Just :

> Hmm, that might actually be client side.  Can you attempt to reproduce
> with rbd-fuse (different client side implementation from the kernel)?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor
>  wrote:
> > root@test:~# uname -a
> > Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
> UTC
> > 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> > 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >>
> >> Also, can you include the kernel version?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
> >> > Snapshotting with cache/tiering *is* supposed to work.  Can you open a
> >> > bug?
> >> > -Sam
> >> >
> >> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic <
> andrija.pa...@gmail.com>
> >> > wrote:
> >> >> This was related to the caching layer, which doesnt support
> >> >> snapshooting per
> >> >> docs...for sake of closing the thread.
> >> >>
> >> >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> Hi all, can you please help me with unexplained situation...
> >> >>>
> >> >>> All snapshot inside ceph broken...
> >> >>>
> >> >>> So, as example, we have VM template, as rbd inside ceph.
> >> >>> We can map it and mount to check that all ok with it
> >> >>>
> >> >>> root@test:~# rbd map
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >> >>> /dev/rbd0
> >> >>> root@test:~# parted /dev/rbd0 print
> >> >>> Model: Unknown (unknown)
> >> >>> Disk /dev/rbd0: 10.7GB
> >> >>> Sector size (logical/physical): 512B/512B
> >> >>> Partition Table: msdos
> >> >>>
> >> >>> Number  Start   End SizeType File system  Flags
> >> >>>  1  1049kB  525MB   524MB   primary  ext4 boot
> >> >>>  2  525MB   10.7GB  10.2GB  primary   lvm
> >> >>>
> >> >>> Than i want to create snap, so i do:
> >> >>> root@test:~# rbd snap create
> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> >>>
> >> >>> And now i want to map it:
> >> >>>
> >> >>> root@test:~# rbd map
> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> >>> /dev/rbd1
> >> >>> root@test:~# parted /dev/rbd1 print
> >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> system).
> >> >>> /dev/rbd1 has been opened read-only.
> >> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> system).
> >> >>> /dev/rbd1 has been opened read-only.
> >> >>> Error: /dev/rbd1: unrecognised disk label
> >> >>>
> >> >>> Even md5 different...
> >> >>> root@ix-s2:~# md5sum /dev/rbd0
> >> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> >> >>> root@ix-s2:~# md5sum /dev/rbd1
> >> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >> >>>
> >> >>>
> >> >>> Ok, now i protect snap and create clone... but same thing...
> >> >>> md5 for clone same as for snap,,
> >> >>>
> >> >>> root@test:~# rbd unmap /dev/rbd1
> >> >>> root@test:~# rbd snap protect
> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> >>> root@test:~# rbd clone
> >> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> >>> cold-storage/test-image
> >> >>> root@test:~# rbd map cold-storage/test-image
> >> >>> /dev/rbd1
> >> >>> root@test:~# md5sum /dev/rbd1
> >> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >> >>>
> >> >>>  but it's broken...
> >> >>> root@test:~# parted /dev/rbd1 print
> >> >>> Error: /dev/rbd1: unrecognised disk label
> >> >>>
> >> >>>
> >> >>> =
> >> >>>
> >> >>> tech details:
> >> >>>
> >> >>> root@test:~# ceph -v
> >> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
> >> >>>
> >> >>> We have 2 inconstistent pgs, but all images not placed on this
> pgs...
> >> >>>
> >> >>> root@test:~# ceph health detail
> >> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >> >>> 18 scrub errors
> >> >>>
> >> >>> 
> >> >>>
> >> >>> root@test:~# ceph osd map cold-storage
> >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
> >> >>> osdmap e16770 pool 'cold-storage' (2) object
> >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) ->
> up
> >> >>> ([37,15,14], p37) acting ([37,15,14], p37)
> >> >>> root@test:~# ceph osd map cold-storage
> >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
> >> >>> osdmap e16770 pool 'cold-storage' (2) object
> >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3
> (2.4a3)
> >> >>> -> up
> >> >>> ([12,23,17], p12) acting ([12,23,17], p12)
> >> >>> root@test:~# ceph osd map cold-storage
> >> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
> >> >>> osdmap e16770 pool 'cold-storage' (2) object
> >> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
> >> >>> (2.2a9)
> >> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
> >> >>>
> >> >>>
> >> >>> Also we use cache la

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
What's supposed to happen is that the client transparently directs all
requests to the cache pool rather than the cold pool when there is a
cache pool.  If the kernel is sending requests to the cold pool,
that's probably where the bug is.  Odd.  It could also be a bug
specific 'forward' mode either in the client or on the osd.  Why did
you have it in that mode?
-Sam

On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
 wrote:
> We used 4.x branch, as we have "very good" Samsung 850 pro in production,
> and they don;t support ncq_trim...
>
> And 4,x first branch which include exceptions for this in libsata.c.
>
> sure we can backport this 1 line to 3.x branch, but we prefer no to go
> deeper if packege for new kernel exist.
>
> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor :
>>
>> root@test:~# uname -a
>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>
>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>>>
>>> Also, can you include the kernel version?
>>> -Sam
>>>
>>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
>>> > Snapshotting with cache/tiering *is* supposed to work.  Can you open a
>>> > bug?
>>> > -Sam
>>> >
>>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>>> >  wrote:
>>> >> This was related to the caching layer, which doesnt support
>>> >> snapshooting per
>>> >> docs...for sake of closing the thread.
>>> >>
>>> >> On 17 August 2015 at 21:15, Voloshanenko Igor
>>> >> 
>>> >> wrote:
>>> >>>
>>> >>> Hi all, can you please help me with unexplained situation...
>>> >>>
>>> >>> All snapshot inside ceph broken...
>>> >>>
>>> >>> So, as example, we have VM template, as rbd inside ceph.
>>> >>> We can map it and mount to check that all ok with it
>>> >>>
>>> >>> root@test:~# rbd map
>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>>> >>> /dev/rbd0
>>> >>> root@test:~# parted /dev/rbd0 print
>>> >>> Model: Unknown (unknown)
>>> >>> Disk /dev/rbd0: 10.7GB
>>> >>> Sector size (logical/physical): 512B/512B
>>> >>> Partition Table: msdos
>>> >>>
>>> >>> Number  Start   End SizeType File system  Flags
>>> >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>>> >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>>> >>>
>>> >>> Than i want to create snap, so i do:
>>> >>> root@test:~# rbd snap create
>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> >>>
>>> >>> And now i want to map it:
>>> >>>
>>> >>> root@test:~# rbd map
>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> >>> /dev/rbd1
>>> >>> root@test:~# parted /dev/rbd1 print
>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>>> >>> /dev/rbd1 has been opened read-only.
>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>>> >>> /dev/rbd1 has been opened read-only.
>>> >>> Error: /dev/rbd1: unrecognised disk label
>>> >>>
>>> >>> Even md5 different...
>>> >>> root@ix-s2:~# md5sum /dev/rbd0
>>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>>> >>> root@ix-s2:~# md5sum /dev/rbd1
>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>> >>>
>>> >>>
>>> >>> Ok, now i protect snap and create clone... but same thing...
>>> >>> md5 for clone same as for snap,,
>>> >>>
>>> >>> root@test:~# rbd unmap /dev/rbd1
>>> >>> root@test:~# rbd snap protect
>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> >>> root@test:~# rbd clone
>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> >>> cold-storage/test-image
>>> >>> root@test:~# rbd map cold-storage/test-image
>>> >>> /dev/rbd1
>>> >>> root@test:~# md5sum /dev/rbd1
>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>> >>>
>>> >>>  but it's broken...
>>> >>> root@test:~# parted /dev/rbd1 print
>>> >>> Error: /dev/rbd1: unrecognised disk label
>>> >>>
>>> >>>
>>> >>> =
>>> >>>
>>> >>> tech details:
>>> >>>
>>> >>> root@test:~# ceph -v
>>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>>> >>>
>>> >>> We have 2 inconstistent pgs, but all images not placed on this pgs...
>>> >>>
>>> >>> root@test:~# ceph health detail
>>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>>> >>> 18 scrub errors
>>> >>>
>>> >>> 
>>> >>>
>>> >>> root@test:~# ceph osd map cold-storage
>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
>>> >>> osdmap e16770 pool 'cold-storage' (2) object
>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
>>> >>> ([37,15,14], p37) acting ([37,15,14], p37)
>>> >>> root@test:~# ceph osd map cold-storage
>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
>>> >>> osdmap e16770 pool 'cold-storage' (2) object
>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3)
>>> >>> -> up
>>> >>> ([12,23,17], p12) acting ([12,23,17], p12)
>>> >>> root@test:~# ceph osd map cold-

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Certainly, don't reproduce this with a cluster you care about :).
-Sam

On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
> What's supposed to happen is that the client transparently directs all
> requests to the cache pool rather than the cold pool when there is a
> cache pool.  If the kernel is sending requests to the cold pool,
> that's probably where the bug is.  Odd.  It could also be a bug
> specific 'forward' mode either in the client or on the osd.  Why did
> you have it in that mode?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>  wrote:
>> We used 4.x branch, as we have "very good" Samsung 850 pro in production,
>> and they don;t support ncq_trim...
>>
>> And 4,x first branch which include exceptions for this in libsata.c.
>>
>> sure we can backport this 1 line to 3.x branch, but we prefer no to go
>> deeper if packege for new kernel exist.
>>
>> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor :
>>>
>>> root@test:~# uname -a
>>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
>>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> 2015-08-21 1:54 GMT+03:00 Samuel Just :

 Also, can you include the kernel version?
 -Sam

 On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
 > Snapshotting with cache/tiering *is* supposed to work.  Can you open a
 > bug?
 > -Sam
 >
 > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
 >  wrote:
 >> This was related to the caching layer, which doesnt support
 >> snapshooting per
 >> docs...for sake of closing the thread.
 >>
 >> On 17 August 2015 at 21:15, Voloshanenko Igor
 >> 
 >> wrote:
 >>>
 >>> Hi all, can you please help me with unexplained situation...
 >>>
 >>> All snapshot inside ceph broken...
 >>>
 >>> So, as example, we have VM template, as rbd inside ceph.
 >>> We can map it and mount to check that all ok with it
 >>>
 >>> root@test:~# rbd map
 >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
 >>> /dev/rbd0
 >>> root@test:~# parted /dev/rbd0 print
 >>> Model: Unknown (unknown)
 >>> Disk /dev/rbd0: 10.7GB
 >>> Sector size (logical/physical): 512B/512B
 >>> Partition Table: msdos
 >>>
 >>> Number  Start   End SizeType File system  Flags
 >>>  1  1049kB  525MB   524MB   primary  ext4 boot
 >>>  2  525MB   10.7GB  10.2GB  primary   lvm
 >>>
 >>> Than i want to create snap, so i do:
 >>> root@test:~# rbd snap create
 >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 >>>
 >>> And now i want to map it:
 >>>
 >>> root@test:~# rbd map
 >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 >>> /dev/rbd1
 >>> root@test:~# parted /dev/rbd1 print
 >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
 >>> /dev/rbd1 has been opened read-only.
 >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
 >>> /dev/rbd1 has been opened read-only.
 >>> Error: /dev/rbd1: unrecognised disk label
 >>>
 >>> Even md5 different...
 >>> root@ix-s2:~# md5sum /dev/rbd0
 >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
 >>> root@ix-s2:~# md5sum /dev/rbd1
 >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 >>>
 >>>
 >>> Ok, now i protect snap and create clone... but same thing...
 >>> md5 for clone same as for snap,,
 >>>
 >>> root@test:~# rbd unmap /dev/rbd1
 >>> root@test:~# rbd snap protect
 >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 >>> root@test:~# rbd clone
 >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 >>> cold-storage/test-image
 >>> root@test:~# rbd map cold-storage/test-image
 >>> /dev/rbd1
 >>> root@test:~# md5sum /dev/rbd1
 >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 >>>
 >>>  but it's broken...
 >>> root@test:~# parted /dev/rbd1 print
 >>> Error: /dev/rbd1: unrecognised disk label
 >>>
 >>>
 >>> =
 >>>
 >>> tech details:
 >>>
 >>> root@test:~# ceph -v
 >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 >>>
 >>> We have 2 inconstistent pgs, but all images not placed on this pgs...
 >>>
 >>> root@test:~# ceph health detail
 >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
 >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
 >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
 >>> 18 scrub errors
 >>>
 >>> 
 >>>
 >>> root@test:~# ceph osd map cold-storage
 >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
 >>> osdmap e16770 pool 'cold-storage' (2) object
 >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
 >>> ([37,15,14], p37) acting ([37,15,14], p37)
 >>> root@test:~# ceph osd map cold-storage
 >>> 0e23c701-401

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
We switch to forward mode as step to switch cache layer off.

Right now we have "samsung 850 pro" in cache layer (10 ssd, 2 per nodes)
and they show 2MB for 4K blocks... 250 IOPS... intead of 18-20K for intel
S3500 240G which we choose as replacement..

So with such good disks - cache layer - very big bottleneck for us...

2015-08-21 2:02 GMT+03:00 Samuel Just :

> What's supposed to happen is that the client transparently directs all
> requests to the cache pool rather than the cold pool when there is a
> cache pool.  If the kernel is sending requests to the cold pool,
> that's probably where the bug is.  Odd.  It could also be a bug
> specific 'forward' mode either in the client or on the osd.  Why did
> you have it in that mode?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>  wrote:
> > We used 4.x branch, as we have "very good" Samsung 850 pro in production,
> > and they don;t support ncq_trim...
> >
> > And 4,x first branch which include exceptions for this in libsata.c.
> >
> > sure we can backport this 1 line to 3.x branch, but we prefer no to go
> > deeper if packege for new kernel exist.
> >
> > 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor  >:
> >>
> >> root@test:~# uname -a
> >> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
> UTC
> >> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >>>
> >>> Also, can you include the kernel version?
> >>> -Sam
> >>>
> >>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  wrote:
> >>> > Snapshotting with cache/tiering *is* supposed to work.  Can you open
> a
> >>> > bug?
> >>> > -Sam
> >>> >
> >>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >>> >  wrote:
> >>> >> This was related to the caching layer, which doesnt support
> >>> >> snapshooting per
> >>> >> docs...for sake of closing the thread.
> >>> >>
> >>> >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >>> >> 
> >>> >> wrote:
> >>> >>>
> >>> >>> Hi all, can you please help me with unexplained situation...
> >>> >>>
> >>> >>> All snapshot inside ceph broken...
> >>> >>>
> >>> >>> So, as example, we have VM template, as rbd inside ceph.
> >>> >>> We can map it and mount to check that all ok with it
> >>> >>>
> >>> >>> root@test:~# rbd map
> >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >>> >>> /dev/rbd0
> >>> >>> root@test:~# parted /dev/rbd0 print
> >>> >>> Model: Unknown (unknown)
> >>> >>> Disk /dev/rbd0: 10.7GB
> >>> >>> Sector size (logical/physical): 512B/512B
> >>> >>> Partition Table: msdos
> >>> >>>
> >>> >>> Number  Start   End SizeType File system  Flags
> >>> >>>  1  1049kB  525MB   524MB   primary  ext4 boot
> >>> >>>  2  525MB   10.7GB  10.2GB  primary   lvm
> >>> >>>
> >>> >>> Than i want to create snap, so i do:
> >>> >>> root@test:~# rbd snap create
> >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >>>
> >>> >>> And now i want to map it:
> >>> >>>
> >>> >>> root@test:~# rbd map
> >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >>> /dev/rbd1
> >>> >>> root@test:~# parted /dev/rbd1 print
> >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> system).
> >>> >>> /dev/rbd1 has been opened read-only.
> >>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> system).
> >>> >>> /dev/rbd1 has been opened read-only.
> >>> >>> Error: /dev/rbd1: unrecognised disk label
> >>> >>>
> >>> >>> Even md5 different...
> >>> >>> root@ix-s2:~# md5sum /dev/rbd0
> >>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> >>> >>> root@ix-s2:~# md5sum /dev/rbd1
> >>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>> >>>
> >>> >>>
> >>> >>> Ok, now i protect snap and create clone... but same thing...
> >>> >>> md5 for clone same as for snap,,
> >>> >>>
> >>> >>> root@test:~# rbd unmap /dev/rbd1
> >>> >>> root@test:~# rbd snap protect
> >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >>> root@test:~# rbd clone
> >>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >>> cold-storage/test-image
> >>> >>> root@test:~# rbd map cold-storage/test-image
> >>> >>> /dev/rbd1
> >>> >>> root@test:~# md5sum /dev/rbd1
> >>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>> >>>
> >>> >>>  but it's broken...
> >>> >>> root@test:~# parted /dev/rbd1 print
> >>> >>> Error: /dev/rbd1: unrecognised disk label
> >>> >>>
> >>> >>>
> >>> >>> =
> >>> >>>
> >>> >>> tech details:
> >>> >>>
> >>> >>> root@test:~# ceph -v
> >>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
> >>> >>>
> >>> >>> We have 2 inconstistent pgs, but all images not placed on this
> pgs...
> >>> >>>
> >>> >>> root@test:~# ceph health detail
> >>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >>> >>> 18 scrub errors
> >>> >>>
> >>> >>> =

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Good joke )

2015-08-21 2:06 GMT+03:00 Samuel Just :

> Certainly, don't reproduce this with a cluster you care about :).
> -Sam
>
> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
> > What's supposed to happen is that the client transparently directs all
> > requests to the cache pool rather than the cold pool when there is a
> > cache pool.  If the kernel is sending requests to the cold pool,
> > that's probably where the bug is.  Odd.  It could also be a bug
> > specific 'forward' mode either in the client or on the osd.  Why did
> > you have it in that mode?
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >  wrote:
> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> production,
> >> and they don;t support ncq_trim...
> >>
> >> And 4,x first branch which include exceptions for this in libsata.c.
> >>
> >> sure we can backport this 1 line to 3.x branch, but we prefer no to go
> >> deeper if packege for new kernel exist.
> >>
> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor <
> igor.voloshane...@gmail.com>:
> >>>
> >>> root@test:~# uname -a
> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
> UTC
> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >>>
> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> 
>  Also, can you include the kernel version?
>  -Sam
> 
>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
> wrote:
>  > Snapshotting with cache/tiering *is* supposed to work.  Can you
> open a
>  > bug?
>  > -Sam
>  >
>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>  >  wrote:
>  >> This was related to the caching layer, which doesnt support
>  >> snapshooting per
>  >> docs...for sake of closing the thread.
>  >>
>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>  >> 
>  >> wrote:
>  >>>
>  >>> Hi all, can you please help me with unexplained situation...
>  >>>
>  >>> All snapshot inside ceph broken...
>  >>>
>  >>> So, as example, we have VM template, as rbd inside ceph.
>  >>> We can map it and mount to check that all ok with it
>  >>>
>  >>> root@test:~# rbd map
>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>  >>> /dev/rbd0
>  >>> root@test:~# parted /dev/rbd0 print
>  >>> Model: Unknown (unknown)
>  >>> Disk /dev/rbd0: 10.7GB
>  >>> Sector size (logical/physical): 512B/512B
>  >>> Partition Table: msdos
>  >>>
>  >>> Number  Start   End SizeType File system  Flags
>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>  >>>
>  >>> Than i want to create snap, so i do:
>  >>> root@test:~# rbd snap create
>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>  >>>
>  >>> And now i want to map it:
>  >>>
>  >>> root@test:~# rbd map
>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>  >>> /dev/rbd1
>  >>> root@test:~# parted /dev/rbd1 print
>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> system).
>  >>> /dev/rbd1 has been opened read-only.
>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> system).
>  >>> /dev/rbd1 has been opened read-only.
>  >>> Error: /dev/rbd1: unrecognised disk label
>  >>>
>  >>> Even md5 different...
>  >>> root@ix-s2:~# md5sum /dev/rbd0
>  >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>  >>> root@ix-s2:~# md5sum /dev/rbd1
>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>  >>>
>  >>>
>  >>> Ok, now i protect snap and create clone... but same thing...
>  >>> md5 for clone same as for snap,,
>  >>>
>  >>> root@test:~# rbd unmap /dev/rbd1
>  >>> root@test:~# rbd snap protect
>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>  >>> root@test:~# rbd clone
>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>  >>> cold-storage/test-image
>  >>> root@test:~# rbd map cold-storage/test-image
>  >>> /dev/rbd1
>  >>> root@test:~# md5sum /dev/rbd1
>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>  >>>
>  >>>  but it's broken...
>  >>> root@test:~# parted /dev/rbd1 print
>  >>> Error: /dev/rbd1: unrecognised disk label
>  >>>
>  >>>
>  >>> =
>  >>>
>  >>> tech details:
>  >>>
>  >>> root@test:~# ceph -v
>  >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>  >>>
>  >>> We have 2 inconstistent pgs, but all images not placed on this
> pgs...
>  >>>
>  >>> root@test:~# ceph health detail
>  >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>  >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>  >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>  >>> 18 scrub errors
>  >>>
>  >>> ===

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
So you started draining the cache pool before you saw either the
inconsistent pgs or the anomalous snap behavior?  (That is, writeback
mode was working correctly?)
-Sam

On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
 wrote:
> Good joke )
>
> 2015-08-21 2:06 GMT+03:00 Samuel Just :
>>
>> Certainly, don't reproduce this with a cluster you care about :).
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
>> > What's supposed to happen is that the client transparently directs all
>> > requests to the cache pool rather than the cold pool when there is a
>> > cache pool.  If the kernel is sending requests to the cold pool,
>> > that's probably where the bug is.  Odd.  It could also be a bug
>> > specific 'forward' mode either in the client or on the osd.  Why did
>> > you have it in that mode?
>> > -Sam
>> >
>> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >  wrote:
>> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>> >> production,
>> >> and they don;t support ncq_trim...
>> >>
>> >> And 4,x first branch which include exceptions for this in libsata.c.
>> >>
>> >> sure we can backport this 1 line to 3.x branch, but we prefer no to go
>> >> deeper if packege for new kernel exist.
>> >>
>> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>> >> :
>> >>>
>> >>> root@test:~# uname -a
>> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
>> >>> UTC
>> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >>>
>> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>> 
>>  Also, can you include the kernel version?
>>  -Sam
>> 
>>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
>>  wrote:
>>  > Snapshotting with cache/tiering *is* supposed to work.  Can you
>>  > open a
>>  > bug?
>>  > -Sam
>>  >
>>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>>  >  wrote:
>>  >> This was related to the caching layer, which doesnt support
>>  >> snapshooting per
>>  >> docs...for sake of closing the thread.
>>  >>
>>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>>  >> 
>>  >> wrote:
>>  >>>
>>  >>> Hi all, can you please help me with unexplained situation...
>>  >>>
>>  >>> All snapshot inside ceph broken...
>>  >>>
>>  >>> So, as example, we have VM template, as rbd inside ceph.
>>  >>> We can map it and mount to check that all ok with it
>>  >>>
>>  >>> root@test:~# rbd map
>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>>  >>> /dev/rbd0
>>  >>> root@test:~# parted /dev/rbd0 print
>>  >>> Model: Unknown (unknown)
>>  >>> Disk /dev/rbd0: 10.7GB
>>  >>> Sector size (logical/physical): 512B/512B
>>  >>> Partition Table: msdos
>>  >>>
>>  >>> Number  Start   End SizeType File system  Flags
>>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>>  >>>
>>  >>> Than i want to create snap, so i do:
>>  >>> root@test:~# rbd snap create
>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>  >>>
>>  >>> And now i want to map it:
>>  >>>
>>  >>> root@test:~# rbd map
>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>  >>> /dev/rbd1
>>  >>> root@test:~# parted /dev/rbd1 print
>>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>>  >>> system).
>>  >>> /dev/rbd1 has been opened read-only.
>>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>>  >>> system).
>>  >>> /dev/rbd1 has been opened read-only.
>>  >>> Error: /dev/rbd1: unrecognised disk label
>>  >>>
>>  >>> Even md5 different...
>>  >>> root@ix-s2:~# md5sum /dev/rbd0
>>  >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>>  >>> root@ix-s2:~# md5sum /dev/rbd1
>>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>  >>>
>>  >>>
>>  >>> Ok, now i protect snap and create clone... but same thing...
>>  >>> md5 for clone same as for snap,,
>>  >>>
>>  >>> root@test:~# rbd unmap /dev/rbd1
>>  >>> root@test:~# rbd snap protect
>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>  >>> root@test:~# rbd clone
>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>  >>> cold-storage/test-image
>>  >>> root@test:~# rbd map cold-storage/test-image
>>  >>> /dev/rbd1
>>  >>> root@test:~# md5sum /dev/rbd1
>>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>  >>>
>>  >>>  but it's broken...
>>  >>> root@test:~# parted /dev/rbd1 print
>>  >>> Error: /dev/rbd1: unrecognised disk label
>>  >>>
>>  >>>
>>  >>> =
>>  >>>
>>  >>> tech details:
>>  >>>
>>  >>> root@test:~# ceph -v
>>  >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>>  >>>
>>  >>> We have 

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Created a ticket to improve our testing here -- this appears to be a hole.

http://tracker.ceph.com/issues/12742
-Sam

On Thu, Aug 20, 2015 at 4:09 PM, Samuel Just  wrote:
> So you started draining the cache pool before you saw either the
> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
> mode was working correctly?)
> -Sam
>
> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>  wrote:
>> Good joke )
>>
>> 2015-08-21 2:06 GMT+03:00 Samuel Just :
>>>
>>> Certainly, don't reproduce this with a cluster you care about :).
>>> -Sam
>>>
>>> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
>>> > What's supposed to happen is that the client transparently directs all
>>> > requests to the cache pool rather than the cold pool when there is a
>>> > cache pool.  If the kernel is sending requests to the cold pool,
>>> > that's probably where the bug is.  Odd.  It could also be a bug
>>> > specific 'forward' mode either in the client or on the osd.  Why did
>>> > you have it in that mode?
>>> > -Sam
>>> >
>>> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>>> >  wrote:
>>> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>>> >> production,
>>> >> and they don;t support ncq_trim...
>>> >>
>>> >> And 4,x first branch which include exceptions for this in libsata.c.
>>> >>
>>> >> sure we can backport this 1 line to 3.x branch, but we prefer no to go
>>> >> deeper if packege for new kernel exist.
>>> >>
>>> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>>> >> :
>>> >>>
>>> >>> root@test:~# uname -a
>>> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
>>> >>> UTC
>>> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>> >>>
>>> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>>> 
>>>  Also, can you include the kernel version?
>>>  -Sam
>>> 
>>>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
>>>  wrote:
>>>  > Snapshotting with cache/tiering *is* supposed to work.  Can you
>>>  > open a
>>>  > bug?
>>>  > -Sam
>>>  >
>>>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>>>  >  wrote:
>>>  >> This was related to the caching layer, which doesnt support
>>>  >> snapshooting per
>>>  >> docs...for sake of closing the thread.
>>>  >>
>>>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>>>  >> 
>>>  >> wrote:
>>>  >>>
>>>  >>> Hi all, can you please help me with unexplained situation...
>>>  >>>
>>>  >>> All snapshot inside ceph broken...
>>>  >>>
>>>  >>> So, as example, we have VM template, as rbd inside ceph.
>>>  >>> We can map it and mount to check that all ok with it
>>>  >>>
>>>  >>> root@test:~# rbd map
>>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>>>  >>> /dev/rbd0
>>>  >>> root@test:~# parted /dev/rbd0 print
>>>  >>> Model: Unknown (unknown)
>>>  >>> Disk /dev/rbd0: 10.7GB
>>>  >>> Sector size (logical/physical): 512B/512B
>>>  >>> Partition Table: msdos
>>>  >>>
>>>  >>> Number  Start   End SizeType File system  Flags
>>>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>>>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>>>  >>>
>>>  >>> Than i want to create snap, so i do:
>>>  >>> root@test:~# rbd snap create
>>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>>  >>>
>>>  >>> And now i want to map it:
>>>  >>>
>>>  >>> root@test:~# rbd map
>>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>>  >>> /dev/rbd1
>>>  >>> root@test:~# parted /dev/rbd1 print
>>>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>>>  >>> system).
>>>  >>> /dev/rbd1 has been opened read-only.
>>>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>>>  >>> system).
>>>  >>> /dev/rbd1 has been opened read-only.
>>>  >>> Error: /dev/rbd1: unrecognised disk label
>>>  >>>
>>>  >>> Even md5 different...
>>>  >>> root@ix-s2:~# md5sum /dev/rbd0
>>>  >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>>>  >>> root@ix-s2:~# md5sum /dev/rbd1
>>>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>>  >>>
>>>  >>>
>>>  >>> Ok, now i protect snap and create clone... but same thing...
>>>  >>> md5 for clone same as for snap,,
>>>  >>>
>>>  >>> root@test:~# rbd unmap /dev/rbd1
>>>  >>> root@test:~# rbd snap protect
>>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>>  >>> root@test:~# rbd clone
>>>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>>  >>> cold-storage/test-image
>>>  >>> root@test:~# rbd map cold-storage/test-image
>>>  >>> /dev/rbd1
>>>  >>> root@test:~# md5sum /dev/rbd1
>>>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>>>  >>>
>>>  >>>  but it's broken...
>>>  >>> root@test:~# parted /de

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
No, when we start draining cache - bad pgs was in place...
We have big rebalance (disk by disk - to change journal side on both
hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and
2 pgs inconsistent...

In writeback - yes, looks like snapshot works good. but it's stop to work
in same moment, when cache layer fulfilled with data and evict/flush
started...



2015-08-21 2:09 GMT+03:00 Samuel Just :

> So you started draining the cache pool before you saw either the
> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
> mode was working correctly?)
> -Sam
>
> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>  wrote:
> > Good joke )
> >
> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
> >>
> >> Certainly, don't reproduce this with a cluster you care about :).
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
> >> > What's supposed to happen is that the client transparently directs all
> >> > requests to the cache pool rather than the cold pool when there is a
> >> > cache pool.  If the kernel is sending requests to the cold pool,
> >> > that's probably where the bug is.  Odd.  It could also be a bug
> >> > specific 'forward' mode either in the client or on the osd.  Why did
> >> > you have it in that mode?
> >> > -Sam
> >> >
> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >> >  wrote:
> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> >> >> production,
> >> >> and they don;t support ncq_trim...
> >> >>
> >> >> And 4,x first branch which include exceptions for this in libsata.c.
> >> >>
> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to
> go
> >> >> deeper if packege for new kernel exist.
> >> >>
> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >> >> :
> >> >>>
> >> >>> root@test:~# uname -a
> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
> 17:37:22
> >> >>> UTC
> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >> >>>
> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >> 
> >>  Also, can you include the kernel version?
> >>  -Sam
> >> 
> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
> >>  wrote:
> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can you
> >>  > open a
> >>  > bug?
> >>  > -Sam
> >>  >
> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >>  >  wrote:
> >>  >> This was related to the caching layer, which doesnt support
> >>  >> snapshooting per
> >>  >> docs...for sake of closing the thread.
> >>  >>
> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >>  >> 
> >>  >> wrote:
> >>  >>>
> >>  >>> Hi all, can you please help me with unexplained situation...
> >>  >>>
> >>  >>> All snapshot inside ceph broken...
> >>  >>>
> >>  >>> So, as example, we have VM template, as rbd inside ceph.
> >>  >>> We can map it and mount to check that all ok with it
> >>  >>>
> >>  >>> root@test:~# rbd map
> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >>  >>> /dev/rbd0
> >>  >>> root@test:~# parted /dev/rbd0 print
> >>  >>> Model: Unknown (unknown)
> >>  >>> Disk /dev/rbd0: 10.7GB
> >>  >>> Sector size (logical/physical): 512B/512B
> >>  >>> Partition Table: msdos
> >>  >>>
> >>  >>> Number  Start   End SizeType File system  Flags
> >>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
> >>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
> >>  >>>
> >>  >>> Than i want to create snap, so i do:
> >>  >>> root@test:~# rbd snap create
> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>  >>>
> >>  >>> And now i want to map it:
> >>  >>>
> >>  >>> root@test:~# rbd map
> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>  >>> /dev/rbd1
> >>  >>> root@test:~# parted /dev/rbd1 print
> >>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> >>  >>> system).
> >>  >>> /dev/rbd1 has been opened read-only.
> >>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
> >>  >>> system).
> >>  >>> /dev/rbd1 has been opened read-only.
> >>  >>> Error: /dev/rbd1: unrecognised disk label
> >>  >>>
> >>  >>> Even md5 different...
> >>  >>> root@ix-s2:~# md5sum /dev/rbd0
> >>  >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> >>  >>> root@ix-s2:~# md5sum /dev/rbd1
> >>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>  >>>
> >>  >>>
> >>  >>> Ok, now i protect snap and create clone... but same thing...
> >>  >>> md5 for clone same as for snap,,
> >>  >>>
> >>  >>> root@test:~# rbd unmap /dev/rbd1
> >>  >>> root@test:~# rbd snap protect
> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>  >>> root@test:~# rbd clone
> >>  >

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Not sure what you mean by:

but it's stop to work in same moment, when cache layer fulfilled with
data and evict/flush started...
-Sam

On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
 wrote:
> No, when we start draining cache - bad pgs was in place...
> We have big rebalance (disk by disk - to change journal side on both
> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2
> pgs inconsistent...
>
> In writeback - yes, looks like snapshot works good. but it's stop to work in
> same moment, when cache layer fulfilled with data and evict/flush started...
>
>
>
> 2015-08-21 2:09 GMT+03:00 Samuel Just :
>>
>> So you started draining the cache pool before you saw either the
>> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
>> mode was working correctly?)
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>>  wrote:
>> > Good joke )
>> >
>> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >>
>> >> Certainly, don't reproduce this with a cluster you care about :).
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
>> >> > What's supposed to happen is that the client transparently directs
>> >> > all
>> >> > requests to the cache pool rather than the cold pool when there is a
>> >> > cache pool.  If the kernel is sending requests to the cold pool,
>> >> > that's probably where the bug is.  Odd.  It could also be a bug
>> >> > specific 'forward' mode either in the client or on the osd.  Why did
>> >> > you have it in that mode?
>> >> > -Sam
>> >> >
>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >> >  wrote:
>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>> >> >> production,
>> >> >> and they don;t support ncq_trim...
>> >> >>
>> >> >> And 4,x first branch which include exceptions for this in libsata.c.
>> >> >>
>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to
>> >> >> go
>> >> >> deeper if packege for new kernel exist.
>> >> >>
>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>> >> >> :
>> >> >>>
>> >> >>> root@test:~# uname -a
>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
>> >> >>> 17:37:22
>> >> >>> UTC
>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >> >>>
>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>> >> 
>> >>  Also, can you include the kernel version?
>> >>  -Sam
>> >> 
>> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
>> >>  wrote:
>> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can you
>> >>  > open a
>> >>  > bug?
>> >>  > -Sam
>> >>  >
>> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>> >>  >  wrote:
>> >>  >> This was related to the caching layer, which doesnt support
>> >>  >> snapshooting per
>> >>  >> docs...for sake of closing the thread.
>> >>  >>
>> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>> >>  >> 
>> >>  >> wrote:
>> >>  >>>
>> >>  >>> Hi all, can you please help me with unexplained situation...
>> >>  >>>
>> >>  >>> All snapshot inside ceph broken...
>> >>  >>>
>> >>  >>> So, as example, we have VM template, as rbd inside ceph.
>> >>  >>> We can map it and mount to check that all ok with it
>> >>  >>>
>> >>  >>> root@test:~# rbd map
>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>> >>  >>> /dev/rbd0
>> >>  >>> root@test:~# parted /dev/rbd0 print
>> >>  >>> Model: Unknown (unknown)
>> >>  >>> Disk /dev/rbd0: 10.7GB
>> >>  >>> Sector size (logical/physical): 512B/512B
>> >>  >>> Partition Table: msdos
>> >>  >>>
>> >>  >>> Number  Start   End SizeType File system  Flags
>> >>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>> >>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>> >>  >>>
>> >>  >>> Than i want to create snap, so i do:
>> >>  >>> root@test:~# rbd snap create
>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>  >>>
>> >>  >>> And now i want to map it:
>> >>  >>>
>> >>  >>> root@test:~# rbd map
>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>> >>  >>> /dev/rbd1
>> >>  >>> root@test:~# parted /dev/rbd1 print
>> >>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>> >>  >>> system).
>> >>  >>> /dev/rbd1 has been opened read-only.
>> >>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>> >>  >>> system).
>> >>  >>> /dev/rbd1 has been opened read-only.
>> >>  >>> Error: /dev/rbd1: unrecognised disk label
>> >>  >>>
>> >>  >>> Even md5 different...
>> >>  >>> root@ix-s2:~# md5sum /dev/rbd0
>> >>  >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
>> >>  >>> root@ix-s2:~# md5sum /dev/rbd1
>> >>  >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>> >>  >>>
>> >>  >>>
>

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Also, what do you mean by "change journal side"?
-Sam

On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just  wrote:
> Not sure what you mean by:
>
> but it's stop to work in same moment, when cache layer fulfilled with
> data and evict/flush started...
> -Sam
>
> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>  wrote:
>> No, when we start draining cache - bad pgs was in place...
>> We have big rebalance (disk by disk - to change journal side on both
>> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2
>> pgs inconsistent...
>>
>> In writeback - yes, looks like snapshot works good. but it's stop to work in
>> same moment, when cache layer fulfilled with data and evict/flush started...
>>
>>
>>
>> 2015-08-21 2:09 GMT+03:00 Samuel Just :
>>>
>>> So you started draining the cache pool before you saw either the
>>> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
>>> mode was working correctly?)
>>> -Sam
>>>
>>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>>>  wrote:
>>> > Good joke )
>>> >
>>> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>>> >>
>>> >> Certainly, don't reproduce this with a cluster you care about :).
>>> >> -Sam
>>> >>
>>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just  wrote:
>>> >> > What's supposed to happen is that the client transparently directs
>>> >> > all
>>> >> > requests to the cache pool rather than the cold pool when there is a
>>> >> > cache pool.  If the kernel is sending requests to the cold pool,
>>> >> > that's probably where the bug is.  Odd.  It could also be a bug
>>> >> > specific 'forward' mode either in the client or on the osd.  Why did
>>> >> > you have it in that mode?
>>> >> > -Sam
>>> >> >
>>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>>> >> >  wrote:
>>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>>> >> >> production,
>>> >> >> and they don;t support ncq_trim...
>>> >> >>
>>> >> >> And 4,x first branch which include exceptions for this in libsata.c.
>>> >> >>
>>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to
>>> >> >> go
>>> >> >> deeper if packege for new kernel exist.
>>> >> >>
>>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>>> >> >> :
>>> >> >>>
>>> >> >>> root@test:~# uname -a
>>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
>>> >> >>> 17:37:22
>>> >> >>> UTC
>>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>> >> >>>
>>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>>> >> 
>>> >>  Also, can you include the kernel version?
>>> >>  -Sam
>>> >> 
>>> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
>>> >>  wrote:
>>> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can you
>>> >>  > open a
>>> >>  > bug?
>>> >>  > -Sam
>>> >>  >
>>> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>>> >>  >  wrote:
>>> >>  >> This was related to the caching layer, which doesnt support
>>> >>  >> snapshooting per
>>> >>  >> docs...for sake of closing the thread.
>>> >>  >>
>>> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>>> >>  >> 
>>> >>  >> wrote:
>>> >>  >>>
>>> >>  >>> Hi all, can you please help me with unexplained situation...
>>> >>  >>>
>>> >>  >>> All snapshot inside ceph broken...
>>> >>  >>>
>>> >>  >>> So, as example, we have VM template, as rbd inside ceph.
>>> >>  >>> We can map it and mount to check that all ok with it
>>> >>  >>>
>>> >>  >>> root@test:~# rbd map
>>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>>> >>  >>> /dev/rbd0
>>> >>  >>> root@test:~# parted /dev/rbd0 print
>>> >>  >>> Model: Unknown (unknown)
>>> >>  >>> Disk /dev/rbd0: 10.7GB
>>> >>  >>> Sector size (logical/physical): 512B/512B
>>> >>  >>> Partition Table: msdos
>>> >>  >>>
>>> >>  >>> Number  Start   End SizeType File system  Flags
>>> >>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>>> >>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>>> >>  >>>
>>> >>  >>> Than i want to create snap, so i do:
>>> >>  >>> root@test:~# rbd snap create
>>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> >>  >>>
>>> >>  >>> And now i want to map it:
>>> >>  >>>
>>> >>  >>> root@test:~# rbd map
>>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>>> >>  >>> /dev/rbd1
>>> >>  >>> root@test:~# parted /dev/rbd1 print
>>> >>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>>> >>  >>> system).
>>> >>  >>> /dev/rbd1 has been opened read-only.
>>> >>  >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file
>>> >>  >>> system).
>>> >>  >>> /dev/rbd1 has been opened read-only.
>>> >>  >>> Error: /dev/rbd1: unrecognised disk label
>>> >>  >>>
>>> >>  >>> Even md5 different...
>

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
WE haven't set values for max_bytes / max_objects.. and all data initially
writes only to cache layer and not flushed at all to cold layer.

Then we received notification from monitoring that we collect about 750GB
in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
size... And then evicting/flushing started...

And issue with snapshots arrived

2015-08-21 2:15 GMT+03:00 Samuel Just :

> Not sure what you mean by:
>
> but it's stop to work in same moment, when cache layer fulfilled with
> data and evict/flush started...
> -Sam
>
> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>  wrote:
> > No, when we start draining cache - bad pgs was in place...
> > We have big rebalance (disk by disk - to change journal side on both
> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
> and 2
> > pgs inconsistent...
> >
> > In writeback - yes, looks like snapshot works good. but it's stop to
> work in
> > same moment, when cache layer fulfilled with data and evict/flush
> started...
> >
> >
> >
> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
> >>
> >> So you started draining the cache pool before you saw either the
> >> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
> >> mode was working correctly?)
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >>  wrote:
> >> > Good joke )
> >> >
> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
> >> >>
> >> >> Certainly, don't reproduce this with a cluster you care about :).
> >> >> -Sam
> >> >>
> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
> wrote:
> >> >> > What's supposed to happen is that the client transparently directs
> >> >> > all
> >> >> > requests to the cache pool rather than the cold pool when there is
> a
> >> >> > cache pool.  If the kernel is sending requests to the cold pool,
> >> >> > that's probably where the bug is.  Odd.  It could also be a bug
> >> >> > specific 'forward' mode either in the client or on the osd.  Why
> did
> >> >> > you have it in that mode?
> >> >> > -Sam
> >> >> >
> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >> >> >  wrote:
> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> >> >> >> production,
> >> >> >> and they don;t support ncq_trim...
> >> >> >>
> >> >> >> And 4,x first branch which include exceptions for this in
> libsata.c.
> >> >> >>
> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no
> to
> >> >> >> go
> >> >> >> deeper if packege for new kernel exist.
> >> >> >>
> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >> >> >> :
> >> >> >>>
> >> >> >>> root@test:~# uname -a
> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
> >> >> >>> 17:37:22
> >> >> >>> UTC
> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >> >> >>>
> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >> >> 
> >> >>  Also, can you include the kernel version?
> >> >>  -Sam
> >> >> 
> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
> >> >>  wrote:
> >> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can
> you
> >> >>  > open a
> >> >>  > bug?
> >> >>  > -Sam
> >> >>  >
> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >> >>  >  wrote:
> >> >>  >> This was related to the caching layer, which doesnt support
> >> >>  >> snapshooting per
> >> >>  >> docs...for sake of closing the thread.
> >> >>  >>
> >> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >> >>  >> 
> >> >>  >> wrote:
> >> >>  >>>
> >> >>  >>> Hi all, can you please help me with unexplained situation...
> >> >>  >>>
> >> >>  >>> All snapshot inside ceph broken...
> >> >>  >>>
> >> >>  >>> So, as example, we have VM template, as rbd inside ceph.
> >> >>  >>> We can map it and mount to check that all ok with it
> >> >>  >>>
> >> >>  >>> root@test:~# rbd map
> >> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >> >>  >>> /dev/rbd0
> >> >>  >>> root@test:~# parted /dev/rbd0 print
> >> >>  >>> Model: Unknown (unknown)
> >> >>  >>> Disk /dev/rbd0: 10.7GB
> >> >>  >>> Sector size (logical/physical): 512B/512B
> >> >>  >>> Partition Table: msdos
> >> >>  >>>
> >> >>  >>> Number  Start   End SizeType File system  Flags
> >> >>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
> >> >>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
> >> >>  >>>
> >> >>  >>> Than i want to create snap, so i do:
> >> >>  >>> root@test:~# rbd snap create
> >> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> >>  >>>
> >> >>  >>> And now i want to map it:
> >> >>  >>>
> >> >>  >>> root@test:~# rbd map
> >> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >> >>  >>> /dev/rbd1
> >> >>  >>> root@test:~# parted /dev/rbd1 

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
But that was still in writeback mode, right?
-Sam

On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
 wrote:
> WE haven't set values for max_bytes / max_objects.. and all data initially
> writes only to cache layer and not flushed at all to cold layer.
>
> Then we received notification from monitoring that we collect about 750GB in
> hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
> size... And then evicting/flushing started...
>
> And issue with snapshots arrived
>
> 2015-08-21 2:15 GMT+03:00 Samuel Just :
>>
>> Not sure what you mean by:
>>
>> but it's stop to work in same moment, when cache layer fulfilled with
>> data and evict/flush started...
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>>  wrote:
>> > No, when we start draining cache - bad pgs was in place...
>> > We have big rebalance (disk by disk - to change journal side on both
>> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
>> > and 2
>> > pgs inconsistent...
>> >
>> > In writeback - yes, looks like snapshot works good. but it's stop to
>> > work in
>> > same moment, when cache layer fulfilled with data and evict/flush
>> > started...
>> >
>> >
>> >
>> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>> >>
>> >> So you started draining the cache pool before you saw either the
>> >> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
>> >> mode was working correctly?)
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>> >>  wrote:
>> >> > Good joke )
>> >> >
>> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> Certainly, don't reproduce this with a cluster you care about :).
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
>> >> >> wrote:
>> >> >> > What's supposed to happen is that the client transparently directs
>> >> >> > all
>> >> >> > requests to the cache pool rather than the cold pool when there is
>> >> >> > a
>> >> >> > cache pool.  If the kernel is sending requests to the cold pool,
>> >> >> > that's probably where the bug is.  Odd.  It could also be a bug
>> >> >> > specific 'forward' mode either in the client or on the osd.  Why
>> >> >> > did
>> >> >> > you have it in that mode?
>> >> >> > -Sam
>> >> >> >
>> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >> >> >  wrote:
>> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>> >> >> >> production,
>> >> >> >> and they don;t support ncq_trim...
>> >> >> >>
>> >> >> >> And 4,x first branch which include exceptions for this in
>> >> >> >> libsata.c.
>> >> >> >>
>> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no
>> >> >> >> to
>> >> >> >> go
>> >> >> >> deeper if packege for new kernel exist.
>> >> >> >>
>> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>> >> >> >> :
>> >> >> >>>
>> >> >> >>> root@test:~# uname -a
>> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
>> >> >> >>> 17:37:22
>> >> >> >>> UTC
>> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >> >> >>>
>> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>> >> >> 
>> >> >>  Also, can you include the kernel version?
>> >> >>  -Sam
>> >> >> 
>> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
>> >> >>  wrote:
>> >> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can
>> >> >>  > you
>> >> >>  > open a
>> >> >>  > bug?
>> >> >>  > -Sam
>> >> >>  >
>> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>> >> >>  >  wrote:
>> >> >>  >> This was related to the caching layer, which doesnt support
>> >> >>  >> snapshooting per
>> >> >>  >> docs...for sake of closing the thread.
>> >> >>  >>
>> >> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>> >> >>  >> 
>> >> >>  >> wrote:
>> >> >>  >>>
>> >> >>  >>> Hi all, can you please help me with unexplained
>> >> >>  >>> situation...
>> >> >>  >>>
>> >> >>  >>> All snapshot inside ceph broken...
>> >> >>  >>>
>> >> >>  >>> So, as example, we have VM template, as rbd inside ceph.
>> >> >>  >>> We can map it and mount to check that all ok with it
>> >> >>  >>>
>> >> >>  >>> root@test:~# rbd map
>> >> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
>> >> >>  >>> /dev/rbd0
>> >> >>  >>> root@test:~# parted /dev/rbd0 print
>> >> >>  >>> Model: Unknown (unknown)
>> >> >>  >>> Disk /dev/rbd0: 10.7GB
>> >> >>  >>> Sector size (logical/physical): 512B/512B
>> >> >>  >>> Partition Table: msdos
>> >> >>  >>>
>> >> >>  >>> Number  Start   End SizeType File system  Flags
>> >> >>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
>> >> >>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
>> >> >>  >>>
>> >> >>  >>> Than i want to create snap, so i do:
>> >> >>  >>> root@test:~# rbd snap create
>> >> >>  >>>

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Our initial values for journal sizes was enough, but flush time was 5 secs,
so we increase journal side to fit flush timeframe min|max for 29/30
seconds.

I mean
  filestore max sync interval = 30
  filestore min sync interval = 29
when said flush time

2015-08-21 2:16 GMT+03:00 Samuel Just :

> Also, what do you mean by "change journal side"?
> -Sam
>
> On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just  wrote:
> > Not sure what you mean by:
> >
> > but it's stop to work in same moment, when cache layer fulfilled with
> > data and evict/flush started...
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> >  wrote:
> >> No, when we start draining cache - bad pgs was in place...
> >> We have big rebalance (disk by disk - to change journal side on both
> >> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
> and 2
> >> pgs inconsistent...
> >>
> >> In writeback - yes, looks like snapshot works good. but it's stop to
> work in
> >> same moment, when cache layer fulfilled with data and evict/flush
> started...
> >>
> >>
> >>
> >> 2015-08-21 2:09 GMT+03:00 Samuel Just :
> >>>
> >>> So you started draining the cache pool before you saw either the
> >>> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
> >>> mode was working correctly?)
> >>> -Sam
> >>>
> >>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >>>  wrote:
> >>> > Good joke )
> >>> >
> >>> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
> >>> >>
> >>> >> Certainly, don't reproduce this with a cluster you care about :).
> >>> >> -Sam
> >>> >>
> >>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
> wrote:
> >>> >> > What's supposed to happen is that the client transparently directs
> >>> >> > all
> >>> >> > requests to the cache pool rather than the cold pool when there
> is a
> >>> >> > cache pool.  If the kernel is sending requests to the cold pool,
> >>> >> > that's probably where the bug is.  Odd.  It could also be a bug
> >>> >> > specific 'forward' mode either in the client or on the osd.  Why
> did
> >>> >> > you have it in that mode?
> >>> >> > -Sam
> >>> >> >
> >>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >>> >> >  wrote:
> >>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> >>> >> >> production,
> >>> >> >> and they don;t support ncq_trim...
> >>> >> >>
> >>> >> >> And 4,x first branch which include exceptions for this in
> libsata.c.
> >>> >> >>
> >>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no
> to
> >>> >> >> go
> >>> >> >> deeper if packege for new kernel exist.
> >>> >> >>
> >>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >>> >> >> :
> >>> >> >>>
> >>> >> >>> root@test:~# uname -a
> >>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
> >>> >> >>> 17:37:22
> >>> >> >>> UTC
> >>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >>> >> >>>
> >>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >>> >> 
> >>> >>  Also, can you include the kernel version?
> >>> >>  -Sam
> >>> >> 
> >>> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just  >
> >>> >>  wrote:
> >>> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can
> you
> >>> >>  > open a
> >>> >>  > bug?
> >>> >>  > -Sam
> >>> >>  >
> >>> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >>> >>  >  wrote:
> >>> >>  >> This was related to the caching layer, which doesnt support
> >>> >>  >> snapshooting per
> >>> >>  >> docs...for sake of closing the thread.
> >>> >>  >>
> >>> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >>> >>  >> 
> >>> >>  >> wrote:
> >>> >>  >>>
> >>> >>  >>> Hi all, can you please help me with unexplained
> situation...
> >>> >>  >>>
> >>> >>  >>> All snapshot inside ceph broken...
> >>> >>  >>>
> >>> >>  >>> So, as example, we have VM template, as rbd inside ceph.
> >>> >>  >>> We can map it and mount to check that all ok with it
> >>> >>  >>>
> >>> >>  >>> root@test:~# rbd map
> >>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >>> >>  >>> /dev/rbd0
> >>> >>  >>> root@test:~# parted /dev/rbd0 print
> >>> >>  >>> Model: Unknown (unknown)
> >>> >>  >>> Disk /dev/rbd0: 10.7GB
> >>> >>  >>> Sector size (logical/physical): 512B/512B
> >>> >>  >>> Partition Table: msdos
> >>> >>  >>>
> >>> >>  >>> Number  Start   End SizeType File system  Flags
> >>> >>  >>>  1  1049kB  525MB   524MB   primary  ext4 boot
> >>> >>  >>>  2  525MB   10.7GB  10.2GB  primary   lvm
> >>> >>  >>>
> >>> >>  >>> Than i want to create snap, so i do:
> >>> >>  >>> root@test:~# rbd snap create
> >>> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >>  >>>
> >>> >>  >>> And now i want to map it:
> >>> >>  >>>
> >>> >>  >>> root@test:~# rbd map
> >>> >>  >>> cold-s

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Right. But issues started...

2015-08-21 2:20 GMT+03:00 Samuel Just :

> But that was still in writeback mode, right?
> -Sam
>
> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>  wrote:
> > WE haven't set values for max_bytes / max_objects.. and all data
> initially
> > writes only to cache layer and not flushed at all to cold layer.
> >
> > Then we received notification from monitoring that we collect about
> 750GB in
> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
> > size... And then evicting/flushing started...
> >
> > And issue with snapshots arrived
> >
> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
> >>
> >> Not sure what you mean by:
> >>
> >> but it's stop to work in same moment, when cache layer fulfilled with
> >> data and evict/flush started...
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> >>  wrote:
> >> > No, when we start draining cache - bad pgs was in place...
> >> > We have big rebalance (disk by disk - to change journal side on both
> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
> >> > and 2
> >> > pgs inconsistent...
> >> >
> >> > In writeback - yes, looks like snapshot works good. but it's stop to
> >> > work in
> >> > same moment, when cache layer fulfilled with data and evict/flush
> >> > started...
> >> >
> >> >
> >> >
> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
> >> >>
> >> >> So you started draining the cache pool before you saw either the
> >> >> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
> >> >> mode was working correctly?)
> >> >> -Sam
> >> >>
> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >> >>  wrote:
> >> >> > Good joke )
> >> >> >
> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
> >> >> >>
> >> >> >> Certainly, don't reproduce this with a cluster you care about :).
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
> >> >> >> wrote:
> >> >> >> > What's supposed to happen is that the client transparently
> directs
> >> >> >> > all
> >> >> >> > requests to the cache pool rather than the cold pool when there
> is
> >> >> >> > a
> >> >> >> > cache pool.  If the kernel is sending requests to the cold pool,
> >> >> >> > that's probably where the bug is.  Odd.  It could also be a bug
> >> >> >> > specific 'forward' mode either in the client or on the osd.  Why
> >> >> >> > did
> >> >> >> > you have it in that mode?
> >> >> >> > -Sam
> >> >> >> >
> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >> >> >> >  wrote:
> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> >> >> >> >> production,
> >> >> >> >> and they don;t support ncq_trim...
> >> >> >> >>
> >> >> >> >> And 4,x first branch which include exceptions for this in
> >> >> >> >> libsata.c.
> >> >> >> >>
> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer
> no
> >> >> >> >> to
> >> >> >> >> go
> >> >> >> >> deeper if packege for new kernel exist.
> >> >> >> >>
> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >> >> >> >> :
> >> >> >> >>>
> >> >> >> >>> root@test:~# uname -a
> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
> >> >> >> >>> 17:37:22
> >> >> >> >>> UTC
> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >> >> >> >>>
> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >> >> >> 
> >> >> >>  Also, can you include the kernel version?
> >> >> >>  -Sam
> >> >> >> 
> >> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just <
> sj...@redhat.com>
> >> >> >>  wrote:
> >> >> >>  > Snapshotting with cache/tiering *is* supposed to work.  Can
> >> >> >>  > you
> >> >> >>  > open a
> >> >> >>  > bug?
> >> >> >>  > -Sam
> >> >> >>  >
> >> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >> >> >>  >  wrote:
> >> >> >>  >> This was related to the caching layer, which doesnt
> support
> >> >> >>  >> snapshooting per
> >> >> >>  >> docs...for sake of closing the thread.
> >> >> >>  >>
> >> >> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >> >> >>  >> 
> >> >> >>  >> wrote:
> >> >> >>  >>>
> >> >> >>  >>> Hi all, can you please help me with unexplained
> >> >> >>  >>> situation...
> >> >> >>  >>>
> >> >> >>  >>> All snapshot inside ceph broken...
> >> >> >>  >>>
> >> >> >>  >>> So, as example, we have VM template, as rbd inside ceph.
> >> >> >>  >>> We can map it and mount to check that all ok with it
> >> >> >>  >>>
> >> >> >>  >>> root@test:~# rbd map
> >> >> >>  >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >> >> >>  >>> /dev/rbd0
> >> >> >>  >>> root@test:~# parted /dev/rbd0 print
> >> >> >>  >>> Model: Unknown (unknown)
> >> >> >>  >>> Disk /dev/rbd0: 10.7GB
> >> >> >>  >>> Sector size (logical/physical): 512B/512B
> >> >> >>  >>> Partition Table: msdos
> >> >> >>  >>>
> >> >>

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Yeah, I'm trying to confirm that the issues did happen in writeback mode.
-Sam

On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
 wrote:
> Right. But issues started...
>
> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>>
>> But that was still in writeback mode, right?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>>  wrote:
>> > WE haven't set values for max_bytes / max_objects.. and all data
>> > initially
>> > writes only to cache layer and not flushed at all to cold layer.
>> >
>> > Then we received notification from monitoring that we collect about
>> > 750GB in
>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
>> > size... And then evicting/flushing started...
>> >
>> > And issue with snapshots arrived
>> >
>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>> >>
>> >> Not sure what you mean by:
>> >>
>> >> but it's stop to work in same moment, when cache layer fulfilled with
>> >> data and evict/flush started...
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>> >>  wrote:
>> >> > No, when we start draining cache - bad pgs was in place...
>> >> > We have big rebalance (disk by disk - to change journal side on both
>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub
>> >> > errors
>> >> > and 2
>> >> > pgs inconsistent...
>> >> >
>> >> > In writeback - yes, looks like snapshot works good. but it's stop to
>> >> > work in
>> >> > same moment, when cache layer fulfilled with data and evict/flush
>> >> > started...
>> >> >
>> >> >
>> >> >
>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> So you started draining the cache pool before you saw either the
>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
>> >> >> writeback
>> >> >> mode was working correctly?)
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Good joke )
>> >> >> >
>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >> >> >>
>> >> >> >> Certainly, don't reproduce this with a cluster you care about :).
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
>> >> >> >> wrote:
>> >> >> >> > What's supposed to happen is that the client transparently
>> >> >> >> > directs
>> >> >> >> > all
>> >> >> >> > requests to the cache pool rather than the cold pool when there
>> >> >> >> > is
>> >> >> >> > a
>> >> >> >> > cache pool.  If the kernel is sending requests to the cold
>> >> >> >> > pool,
>> >> >> >> > that's probably where the bug is.  Odd.  It could also be a bug
>> >> >> >> > specific 'forward' mode either in the client or on the osd.
>> >> >> >> > Why
>> >> >> >> > did
>> >> >> >> > you have it in that mode?
>> >> >> >> > -Sam
>> >> >> >> >
>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >> >> >> >  wrote:
>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>> >> >> >> >> production,
>> >> >> >> >> and they don;t support ncq_trim...
>> >> >> >> >>
>> >> >> >> >> And 4,x first branch which include exceptions for this in
>> >> >> >> >> libsata.c.
>> >> >> >> >>
>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer
>> >> >> >> >> no
>> >> >> >> >> to
>> >> >> >> >> go
>> >> >> >> >> deeper if packege for new kernel exist.
>> >> >> >> >>
>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>> >> >> >> >> :
>> >> >> >> >>>
>> >> >> >> >>> root@test:~# uname -a
>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
>> >> >> >> >>> 17:37:22
>> >> >> >> >>> UTC
>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >> >> >> >>>
>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>> >> >> >> 
>> >> >> >>  Also, can you include the kernel version?
>> >> >> >>  -Sam
>> >> >> >> 
>> >> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
>> >> >> >>  
>> >> >> >>  wrote:
>> >> >> >>  > Snapshotting with cache/tiering *is* supposed to work.
>> >> >> >>  > Can
>> >> >> >>  > you
>> >> >> >>  > open a
>> >> >> >>  > bug?
>> >> >> >>  > -Sam
>> >> >> >>  >
>> >> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>> >> >> >>  >  wrote:
>> >> >> >>  >> This was related to the caching layer, which doesnt
>> >> >> >>  >> support
>> >> >> >>  >> snapshooting per
>> >> >> >>  >> docs...for sake of closing the thread.
>> >> >> >>  >>
>> >> >> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>> >> >> >>  >> 
>> >> >> >>  >> wrote:
>> >> >> >>  >>>
>> >> >> >>  >>> Hi all, can you please help me with unexplained
>> >> >> >>  >>> situation...
>> >> >> >>  >>>
>> >> >> >>  >>> All snapshot inside ceph broken...
>> >> >> >>  >>>
>> >> >> >>  >>> So, as example, we have VM template, as rbd inside ceph.
>> >> >> >>  >>> We can map it and mount to check that all ok with it
>> >> >> >>  >>>
>> >> >> >>  >>> root@test:~

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Specifically, the snap behavior (we already know that the pgs went
inconsistent while the pool was in writeback mode, right?).
-Sam

On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just  wrote:
> Yeah, I'm trying to confirm that the issues did happen in writeback mode.
> -Sam
>
> On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
>  wrote:
>> Right. But issues started...
>>
>> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>>>
>>> But that was still in writeback mode, right?
>>> -Sam
>>>
>>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>>>  wrote:
>>> > WE haven't set values for max_bytes / max_objects.. and all data
>>> > initially
>>> > writes only to cache layer and not flushed at all to cold layer.
>>> >
>>> > Then we received notification from monitoring that we collect about
>>> > 750GB in
>>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
>>> > size... And then evicting/flushing started...
>>> >
>>> > And issue with snapshots arrived
>>> >
>>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>>> >>
>>> >> Not sure what you mean by:
>>> >>
>>> >> but it's stop to work in same moment, when cache layer fulfilled with
>>> >> data and evict/flush started...
>>> >> -Sam
>>> >>
>>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>>> >>  wrote:
>>> >> > No, when we start draining cache - bad pgs was in place...
>>> >> > We have big rebalance (disk by disk - to change journal side on both
>>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub
>>> >> > errors
>>> >> > and 2
>>> >> > pgs inconsistent...
>>> >> >
>>> >> > In writeback - yes, looks like snapshot works good. but it's stop to
>>> >> > work in
>>> >> > same moment, when cache layer fulfilled with data and evict/flush
>>> >> > started...
>>> >> >
>>> >> >
>>> >> >
>>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>>> >> >>
>>> >> >> So you started draining the cache pool before you saw either the
>>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
>>> >> >> writeback
>>> >> >> mode was working correctly?)
>>> >> >> -Sam
>>> >> >>
>>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>>> >> >>  wrote:
>>> >> >> > Good joke )
>>> >> >> >
>>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>>> >> >> >>
>>> >> >> >> Certainly, don't reproduce this with a cluster you care about :).
>>> >> >> >> -Sam
>>> >> >> >>
>>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
>>> >> >> >> wrote:
>>> >> >> >> > What's supposed to happen is that the client transparently
>>> >> >> >> > directs
>>> >> >> >> > all
>>> >> >> >> > requests to the cache pool rather than the cold pool when there
>>> >> >> >> > is
>>> >> >> >> > a
>>> >> >> >> > cache pool.  If the kernel is sending requests to the cold
>>> >> >> >> > pool,
>>> >> >> >> > that's probably where the bug is.  Odd.  It could also be a bug
>>> >> >> >> > specific 'forward' mode either in the client or on the osd.
>>> >> >> >> > Why
>>> >> >> >> > did
>>> >> >> >> > you have it in that mode?
>>> >> >> >> > -Sam
>>> >> >> >> >
>>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>>> >> >> >> >  wrote:
>>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>>> >> >> >> >> production,
>>> >> >> >> >> and they don;t support ncq_trim...
>>> >> >> >> >>
>>> >> >> >> >> And 4,x first branch which include exceptions for this in
>>> >> >> >> >> libsata.c.
>>> >> >> >> >>
>>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer
>>> >> >> >> >> no
>>> >> >> >> >> to
>>> >> >> >> >> go
>>> >> >> >> >> deeper if packege for new kernel exist.
>>> >> >> >> >>
>>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>>> >> >> >> >> :
>>> >> >> >> >>>
>>> >> >> >> >>> root@test:~# uname -a
>>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
>>> >> >> >> >>> 17:37:22
>>> >> >> >> >>> UTC
>>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>> >> >> >> >>>
>>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>>> >> >> >> 
>>> >> >> >>  Also, can you include the kernel version?
>>> >> >> >>  -Sam
>>> >> >> >> 
>>> >> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
>>> >> >> >>  
>>> >> >> >>  wrote:
>>> >> >> >>  > Snapshotting with cache/tiering *is* supposed to work.
>>> >> >> >>  > Can
>>> >> >> >>  > you
>>> >> >> >>  > open a
>>> >> >> >>  > bug?
>>> >> >> >>  > -Sam
>>> >> >> >>  >
>>> >> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>>> >> >> >>  >  wrote:
>>> >> >> >>  >> This was related to the caching layer, which doesnt
>>> >> >> >>  >> support
>>> >> >> >>  >> snapshooting per
>>> >> >> >>  >> docs...for sake of closing the thread.
>>> >> >> >>  >>
>>> >> >> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>>> >> >> >>  >> 
>>> >> >> >>  >> wrote:
>>> >> >> >>  >>>
>>> >> >> >>  >>> Hi all, can you please help me with unexplained
>>> >> >> >> 

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
I mean in forward mode - it;s permanent problem - snapshots not working.
And for writeback mode after we change max_bytes/object values, it;s around
30 by 70... 70% of time it;s works... 30% - not. Looks like for old images
- snapshots works fine (images which already exists before we change
values). For any new images - no

2015-08-21 2:21 GMT+03:00 Voloshanenko Igor :

> Right. But issues started...
>
> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>
>> But that was still in writeback mode, right?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>>  wrote:
>> > WE haven't set values for max_bytes / max_objects.. and all data
>> initially
>> > writes only to cache layer and not flushed at all to cold layer.
>> >
>> > Then we received notification from monitoring that we collect about
>> 750GB in
>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
>> > size... And then evicting/flushing started...
>> >
>> > And issue with snapshots arrived
>> >
>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>> >>
>> >> Not sure what you mean by:
>> >>
>> >> but it's stop to work in same moment, when cache layer fulfilled with
>> >> data and evict/flush started...
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>> >>  wrote:
>> >> > No, when we start draining cache - bad pgs was in place...
>> >> > We have big rebalance (disk by disk - to change journal side on both
>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub
>> errors
>> >> > and 2
>> >> > pgs inconsistent...
>> >> >
>> >> > In writeback - yes, looks like snapshot works good. but it's stop to
>> >> > work in
>> >> > same moment, when cache layer fulfilled with data and evict/flush
>> >> > started...
>> >> >
>> >> >
>> >> >
>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> So you started draining the cache pool before you saw either the
>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
>> writeback
>> >> >> mode was working correctly?)
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Good joke )
>> >> >> >
>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >> >> >>
>> >> >> >> Certainly, don't reproduce this with a cluster you care about :).
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
>> >> >> >> wrote:
>> >> >> >> > What's supposed to happen is that the client transparently
>> directs
>> >> >> >> > all
>> >> >> >> > requests to the cache pool rather than the cold pool when
>> there is
>> >> >> >> > a
>> >> >> >> > cache pool.  If the kernel is sending requests to the cold
>> pool,
>> >> >> >> > that's probably where the bug is.  Odd.  It could also be a bug
>> >> >> >> > specific 'forward' mode either in the client or on the osd.
>> Why
>> >> >> >> > did
>> >> >> >> > you have it in that mode?
>> >> >> >> > -Sam
>> >> >> >> >
>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >> >> >> >  wrote:
>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
>> >> >> >> >> production,
>> >> >> >> >> and they don;t support ncq_trim...
>> >> >> >> >>
>> >> >> >> >> And 4,x first branch which include exceptions for this in
>> >> >> >> >> libsata.c.
>> >> >> >> >>
>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we prefer
>> no
>> >> >> >> >> to
>> >> >> >> >> go
>> >> >> >> >> deeper if packege for new kernel exist.
>> >> >> >> >>
>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>> >> >> >> >> :
>> >> >> >> >>>
>> >> >> >> >>> root@test:~# uname -a
>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
>> >> >> >> >>> 17:37:22
>> >> >> >> >>> UTC
>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >> >> >> >>>
>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>> >> >> >> 
>> >> >> >>  Also, can you include the kernel version?
>> >> >> >>  -Sam
>> >> >> >> 
>> >> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just <
>> sj...@redhat.com>
>> >> >> >>  wrote:
>> >> >> >>  > Snapshotting with cache/tiering *is* supposed to work.
>> Can
>> >> >> >>  > you
>> >> >> >>  > open a
>> >> >> >>  > bug?
>> >> >> >>  > -Sam
>> >> >> >>  >
>> >> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
>> >> >> >>  >  wrote:
>> >> >> >>  >> This was related to the caching layer, which doesnt
>> support
>> >> >> >>  >> snapshooting per
>> >> >> >>  >> docs...for sake of closing the thread.
>> >> >> >>  >>
>> >> >> >>  >> On 17 August 2015 at 21:15, Voloshanenko Igor
>> >> >> >>  >> 
>> >> >> >>  >> wrote:
>> >> >> >>  >>>
>> >> >> >>  >>> Hi all, can you please help me with unexplained
>> >> >> >>  >>> situation...
>> >> >> >>  >>>
>> >> >> >>  >>> All snapshot inside ceph broken...
>> >> >> >>  >>>
>> >> >> >>  >>> So, as example, we have VM template, as rbd insid

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Right ( but also was rebalancing cycle 2 day before pgs corrupted)

2015-08-21 2:23 GMT+03:00 Samuel Just :

> Specifically, the snap behavior (we already know that the pgs went
> inconsistent while the pool was in writeback mode, right?).
> -Sam
>
> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just  wrote:
> > Yeah, I'm trying to confirm that the issues did happen in writeback mode.
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
> >  wrote:
> >> Right. But issues started...
> >>
> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
> >>>
> >>> But that was still in writeback mode, right?
> >>> -Sam
> >>>
> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
> >>>  wrote:
> >>> > WE haven't set values for max_bytes / max_objects.. and all data
> >>> > initially
> >>> > writes only to cache layer and not flushed at all to cold layer.
> >>> >
> >>> > Then we received notification from monitoring that we collect about
> >>> > 750GB in
> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
> >>> > size... And then evicting/flushing started...
> >>> >
> >>> > And issue with snapshots arrived
> >>> >
> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
> >>> >>
> >>> >> Not sure what you mean by:
> >>> >>
> >>> >> but it's stop to work in same moment, when cache layer fulfilled
> with
> >>> >> data and evict/flush started...
> >>> >> -Sam
> >>> >>
> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> >>> >>  wrote:
> >>> >> > No, when we start draining cache - bad pgs was in place...
> >>> >> > We have big rebalance (disk by disk - to change journal side on
> both
> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub
> >>> >> > errors
> >>> >> > and 2
> >>> >> > pgs inconsistent...
> >>> >> >
> >>> >> > In writeback - yes, looks like snapshot works good. but it's stop
> to
> >>> >> > work in
> >>> >> > same moment, when cache layer fulfilled with data and evict/flush
> >>> >> > started...
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
> >>> >> >>
> >>> >> >> So you started draining the cache pool before you saw either the
> >>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
> >>> >> >> writeback
> >>> >> >> mode was working correctly?)
> >>> >> >> -Sam
> >>> >> >>
> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >>> >> >>  wrote:
> >>> >> >> > Good joke )
> >>> >> >> >
> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
> >>> >> >> >>
> >>> >> >> >> Certainly, don't reproduce this with a cluster you care about
> :).
> >>> >> >> >> -Sam
> >>> >> >> >>
> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just <
> sj...@redhat.com>
> >>> >> >> >> wrote:
> >>> >> >> >> > What's supposed to happen is that the client transparently
> >>> >> >> >> > directs
> >>> >> >> >> > all
> >>> >> >> >> > requests to the cache pool rather than the cold pool when
> there
> >>> >> >> >> > is
> >>> >> >> >> > a
> >>> >> >> >> > cache pool.  If the kernel is sending requests to the cold
> >>> >> >> >> > pool,
> >>> >> >> >> > that's probably where the bug is.  Odd.  It could also be a
> bug
> >>> >> >> >> > specific 'forward' mode either in the client or on the osd.
> >>> >> >> >> > Why
> >>> >> >> >> > did
> >>> >> >> >> > you have it in that mode?
> >>> >> >> >> > -Sam
> >>> >> >> >> >
> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >>> >> >> >> >  wrote:
> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro
> in
> >>> >> >> >> >> production,
> >>> >> >> >> >> and they don;t support ncq_trim...
> >>> >> >> >> >>
> >>> >> >> >> >> And 4,x first branch which include exceptions for this in
> >>> >> >> >> >> libsata.c.
> >>> >> >> >> >>
> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we
> prefer
> >>> >> >> >> >> no
> >>> >> >> >> >> to
> >>> >> >> >> >> go
> >>> >> >> >> >> deeper if packege for new kernel exist.
> >>> >> >> >> >>
> >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >>> >> >> >> >> :
> >>> >> >> >> >>>
> >>> >> >> >> >>> root@test:~# uname -a
> >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun
> May 17
> >>> >> >> >> >>> 17:37:22
> >>> >> >> >> >>> UTC
> >>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >>> >> >> >> >>>
> >>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >>> >> >> >> 
> >>> >> >> >>  Also, can you include the kernel version?
> >>> >> >> >>  -Sam
> >>> >> >> >> 
> >>> >> >> >>  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
> >>> >> >> >>  
> >>> >> >> >>  wrote:
> >>> >> >> >>  > Snapshotting with cache/tiering *is* supposed to work.
> >>> >> >> >>  > Can
> >>> >> >> >>  > you
> >>> >> >> >>  > open a
> >>> >> >> >>  > bug?
> >>> >> >> >>  > -Sam
> >>> >> >> >>  >
> >>> >> >> >>  > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >>> >> >> >>  >  wrote:
> >>> >> >> >>  >> This was relate

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Exactly

пятница, 21 августа 2015 г. пользователь Samuel Just написал:

> And you adjusted the journals by removing the osd, recreating it with
> a larger journal, and reinserting it?
> -Sam
>
> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
> > wrote:
> > Right ( but also was rebalancing cycle 2 day before pgs corrupted)
> >
> > 2015-08-21 2:23 GMT+03:00 Samuel Just >:
> >>
> >> Specifically, the snap behavior (we already know that the pgs went
> >> inconsistent while the pool was in writeback mode, right?).
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just  > wrote:
> >> > Yeah, I'm trying to confirm that the issues did happen in writeback
> >> > mode.
> >> > -Sam
> >> >
> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
> >> > > wrote:
> >> >> Right. But issues started...
> >> >>
> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just  >:
> >> >>>
> >> >>> But that was still in writeback mode, right?
> >> >>> -Sam
> >> >>>
> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
> >> >>> > wrote:
> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data
> >> >>> > initially
> >> >>> > writes only to cache layer and not flushed at all to cold layer.
> >> >>> >
> >> >>> > Then we received notification from monitoring that we collect
> about
> >> >>> > 750GB in
> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of
> >> >>> > disk
> >> >>> > size... And then evicting/flushing started...
> >> >>> >
> >> >>> > And issue with snapshots arrived
> >> >>> >
> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just  >:
> >> >>> >>
> >> >>> >> Not sure what you mean by:
> >> >>> >>
> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled
> >> >>> >> with
> >> >>> >> data and evict/flush started...
> >> >>> >> -Sam
> >> >>> >>
> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> >> >>> >> > wrote:
> >> >>> >> > No, when we start draining cache - bad pgs was in place...
> >> >>> >> > We have big rebalance (disk by disk - to change journal side on
> >> >>> >> > both
> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub
> >> >>> >> > errors
> >> >>> >> > and 2
> >> >>> >> > pgs inconsistent...
> >> >>> >> >
> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's
> stop
> >> >>> >> > to
> >> >>> >> > work in
> >> >>> >> > same moment, when cache layer fulfilled with data and
> evict/flush
> >> >>> >> > started...
> >> >>> >> >
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just  >:
> >> >>> >> >>
> >> >>> >> >> So you started draining the cache pool before you saw either
> the
> >> >>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
> >> >>> >> >> writeback
> >> >>> >> >> mode was working correctly?)
> >> >>> >> >> -Sam
> >> >>> >> >>
> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >> >>> >> >> > wrote:
> >> >>> >> >> > Good joke )
> >> >>> >> >> >
> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just  >:
> >> >>> >> >> >>
> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care
> about
> >> >>> >> >> >> :).
> >> >>> >> >> >> -Sam
> >> >>> >> >> >>
> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
> >> >>> >> >> >> >
> >> >>> >> >> >> wrote:
> >> >>> >> >> >> > What's supposed to happen is that the client
> transparently
> >> >>> >> >> >> > directs
> >> >>> >> >> >> > all
> >> >>> >> >> >> > requests to the cache pool rather than the cold pool when
> >> >>> >> >> >> > there
> >> >>> >> >> >> > is
> >> >>> >> >> >> > a
> >> >>> >> >> >> > cache pool.  If the kernel is sending requests to the
> cold
> >> >>> >> >> >> > pool,
> >> >>> >> >> >> > that's probably where the bug is.  Odd.  It could also
> be a
> >> >>> >> >> >> > bug
> >> >>> >> >> >> > specific 'forward' mode either in the client or on the
> osd.
> >> >>> >> >> >> > Why
> >> >>> >> >> >> > did
> >> >>> >> >> >> > you have it in that mode?
> >> >>> >> >> >> > -Sam
> >> >>> >> >> >> >
> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >> >>> >> >> >> > > wrote:
> >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850
> pro
> >> >>> >> >> >> >> in
> >> >>> >> >> >> >> production,
> >> >>> >> >> >> >> and they don;t support ncq_trim...
> >> >>> >> >> >> >>
> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this
> in
> >> >>> >> >> >> >> libsata.c.
> >> >>> >> >> >> >>
> >> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we
> >> >>> >> >> >> >> prefer
> >> >>> >> >> >> >> no
> >> >>> >> >> >> >> to
> >> >>> >> >> >> >> go
> >> >>> >> >> >> >> deeper if packege for new kernel exist.
> >> >>> >> >> >> >>
> >> >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >> >>> >> >> >> >> >:
> >> >>> >> >> >> >>>
> >> >>> >> >> >> >>> root@test:~# uname -a
> >> >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun
> >> >>> >> >> >> >>> May 17

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
And you adjusted the journals by removing the osd, recreating it with
a larger journal, and reinserting it?
-Sam

On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
 wrote:
> Right ( but also was rebalancing cycle 2 day before pgs corrupted)
>
> 2015-08-21 2:23 GMT+03:00 Samuel Just :
>>
>> Specifically, the snap behavior (we already know that the pgs went
>> inconsistent while the pool was in writeback mode, right?).
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just  wrote:
>> > Yeah, I'm trying to confirm that the issues did happen in writeback
>> > mode.
>> > -Sam
>> >
>> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
>> >  wrote:
>> >> Right. But issues started...
>> >>
>> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>> >>>
>> >>> But that was still in writeback mode, right?
>> >>> -Sam
>> >>>
>> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>> >>>  wrote:
>> >>> > WE haven't set values for max_bytes / max_objects.. and all data
>> >>> > initially
>> >>> > writes only to cache layer and not flushed at all to cold layer.
>> >>> >
>> >>> > Then we received notification from monitoring that we collect about
>> >>> > 750GB in
>> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of
>> >>> > disk
>> >>> > size... And then evicting/flushing started...
>> >>> >
>> >>> > And issue with snapshots arrived
>> >>> >
>> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>> >>> >>
>> >>> >> Not sure what you mean by:
>> >>> >>
>> >>> >> but it's stop to work in same moment, when cache layer fulfilled
>> >>> >> with
>> >>> >> data and evict/flush started...
>> >>> >> -Sam
>> >>> >>
>> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>> >>> >>  wrote:
>> >>> >> > No, when we start draining cache - bad pgs was in place...
>> >>> >> > We have big rebalance (disk by disk - to change journal side on
>> >>> >> > both
>> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub
>> >>> >> > errors
>> >>> >> > and 2
>> >>> >> > pgs inconsistent...
>> >>> >> >
>> >>> >> > In writeback - yes, looks like snapshot works good. but it's stop
>> >>> >> > to
>> >>> >> > work in
>> >>> >> > same moment, when cache layer fulfilled with data and evict/flush
>> >>> >> > started...
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>> >>> >> >>
>> >>> >> >> So you started draining the cache pool before you saw either the
>> >>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
>> >>> >> >> writeback
>> >>> >> >> mode was working correctly?)
>> >>> >> >> -Sam
>> >>> >> >>
>> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>> >>> >> >>  wrote:
>> >>> >> >> > Good joke )
>> >>> >> >> >
>> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >>> >> >> >>
>> >>> >> >> >> Certainly, don't reproduce this with a cluster you care about
>> >>> >> >> >> :).
>> >>> >> >> >> -Sam
>> >>> >> >> >>
>> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
>> >>> >> >> >> 
>> >>> >> >> >> wrote:
>> >>> >> >> >> > What's supposed to happen is that the client transparently
>> >>> >> >> >> > directs
>> >>> >> >> >> > all
>> >>> >> >> >> > requests to the cache pool rather than the cold pool when
>> >>> >> >> >> > there
>> >>> >> >> >> > is
>> >>> >> >> >> > a
>> >>> >> >> >> > cache pool.  If the kernel is sending requests to the cold
>> >>> >> >> >> > pool,
>> >>> >> >> >> > that's probably where the bug is.  Odd.  It could also be a
>> >>> >> >> >> > bug
>> >>> >> >> >> > specific 'forward' mode either in the client or on the osd.
>> >>> >> >> >> > Why
>> >>> >> >> >> > did
>> >>> >> >> >> > you have it in that mode?
>> >>> >> >> >> > -Sam
>> >>> >> >> >> >
>> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >>> >> >> >> >  wrote:
>> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro
>> >>> >> >> >> >> in
>> >>> >> >> >> >> production,
>> >>> >> >> >> >> and they don;t support ncq_trim...
>> >>> >> >> >> >>
>> >>> >> >> >> >> And 4,x first branch which include exceptions for this in
>> >>> >> >> >> >> libsata.c.
>> >>> >> >> >> >>
>> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we
>> >>> >> >> >> >> prefer
>> >>> >> >> >> >> no
>> >>> >> >> >> >> to
>> >>> >> >> >> >> go
>> >>> >> >> >> >> deeper if packege for new kernel exist.
>> >>> >> >> >> >>
>> >>> >> >> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
>> >>> >> >> >> >> :
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> root@test:~# uname -a
>> >>> >> >> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun
>> >>> >> >> >> >>> May 17
>> >>> >> >> >> >>> 17:37:22
>> >>> >> >> >> >>> UTC
>> >>> >> >> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >>> >> >> >> >>>
>> >>> >> >> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>> >>> >> >> >> 
>> >>> >> >> >>  Also, can you include the kernel version?
>> >>> >> >> >>  -Sam
>> >>> >> >> >> 
>> >>> >> >> >>  On Thu, Aug 20, 2015 at 3:51 PM

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Ok, create a ticket with a timeline and all of this information, I'll
try to look into it more tomorrow.
-Sam

On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor
 wrote:
> Exactly
>
> пятница, 21 августа 2015 г. пользователь Samuel Just написал:
>
>> And you adjusted the journals by removing the osd, recreating it with
>> a larger journal, and reinserting it?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
>>  wrote:
>> > Right ( but also was rebalancing cycle 2 day before pgs corrupted)
>> >
>> > 2015-08-21 2:23 GMT+03:00 Samuel Just :
>> >>
>> >> Specifically, the snap behavior (we already know that the pgs went
>> >> inconsistent while the pool was in writeback mode, right?).
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just  wrote:
>> >> > Yeah, I'm trying to confirm that the issues did happen in writeback
>> >> > mode.
>> >> > -Sam
>> >> >
>> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
>> >> >  wrote:
>> >> >> Right. But issues started...
>> >> >>
>> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>> >> >>>
>> >> >>> But that was still in writeback mode, right?
>> >> >>> -Sam
>> >> >>>
>> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>> >> >>>  wrote:
>> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data
>> >> >>> > initially
>> >> >>> > writes only to cache layer and not flushed at all to cold layer.
>> >> >>> >
>> >> >>> > Then we received notification from monitoring that we collect
>> >> >>> > about
>> >> >>> > 750GB in
>> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of
>> >> >>> > disk
>> >> >>> > size... And then evicting/flushing started...
>> >> >>> >
>> >> >>> > And issue with snapshots arrived
>> >> >>> >
>> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>> >> >>> >>
>> >> >>> >> Not sure what you mean by:
>> >> >>> >>
>> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled
>> >> >>> >> with
>> >> >>> >> data and evict/flush started...
>> >> >>> >> -Sam
>> >> >>> >>
>> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>> >> >>> >>  wrote:
>> >> >>> >> > No, when we start draining cache - bad pgs was in place...
>> >> >>> >> > We have big rebalance (disk by disk - to change journal side
>> >> >>> >> > on
>> >> >>> >> > both
>> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived
>> >> >>> >> > scrub
>> >> >>> >> > errors
>> >> >>> >> > and 2
>> >> >>> >> > pgs inconsistent...
>> >> >>> >> >
>> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's
>> >> >>> >> > stop
>> >> >>> >> > to
>> >> >>> >> > work in
>> >> >>> >> > same moment, when cache layer fulfilled with data and
>> >> >>> >> > evict/flush
>> >> >>> >> > started...
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>> >> >>> >> >>
>> >> >>> >> >> So you started draining the cache pool before you saw either
>> >> >>> >> >> the
>> >> >>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
>> >> >>> >> >> writeback
>> >> >>> >> >> mode was working correctly?)
>> >> >>> >> >> -Sam
>> >> >>> >> >>
>> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>> >> >>> >> >>  wrote:
>> >> >>> >> >> > Good joke )
>> >> >>> >> >> >
>> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >> >>> >> >> >>
>> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care
>> >> >>> >> >> >> about
>> >> >>> >> >> >> :).
>> >> >>> >> >> >> -Sam
>> >> >>> >> >> >>
>> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
>> >> >>> >> >> >> 
>> >> >>> >> >> >> wrote:
>> >> >>> >> >> >> > What's supposed to happen is that the client
>> >> >>> >> >> >> > transparently
>> >> >>> >> >> >> > directs
>> >> >>> >> >> >> > all
>> >> >>> >> >> >> > requests to the cache pool rather than the cold pool
>> >> >>> >> >> >> > when
>> >> >>> >> >> >> > there
>> >> >>> >> >> >> > is
>> >> >>> >> >> >> > a
>> >> >>> >> >> >> > cache pool.  If the kernel is sending requests to the
>> >> >>> >> >> >> > cold
>> >> >>> >> >> >> > pool,
>> >> >>> >> >> >> > that's probably where the bug is.  Odd.  It could also
>> >> >>> >> >> >> > be a
>> >> >>> >> >> >> > bug
>> >> >>> >> >> >> > specific 'forward' mode either in the client or on the
>> >> >>> >> >> >> > osd.
>> >> >>> >> >> >> > Why
>> >> >>> >> >> >> > did
>> >> >>> >> >> >> > you have it in that mode?
>> >> >>> >> >> >> > -Sam
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >> >>> >> >> >> >  wrote:
>> >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850
>> >> >>> >> >> >> >> pro
>> >> >>> >> >> >> >> in
>> >> >>> >> >> >> >> production,
>> >> >>> >> >> >> >> and they don;t support ncq_trim...
>> >> >>> >> >> >> >>
>> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this
>> >> >>> >> >> >> >> in
>> >> >>> >> >> >> >> libsata.c.
>> >> >>> >> >> >> >>
>> >> >>> 

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
As i we use journal collocation for journal now (because we want to utilize
cache layer ((( ) i use ceph-disk to create new OSD (changed journal size
on ceph.conf). I don;t prefer manual work))

So create very simple script to update journal size

2015-08-21 2:25 GMT+03:00 Voloshanenko Igor :

> Exactly
>
> пятница, 21 августа 2015 г. пользователь Samuel Just написал:
>
> And you adjusted the journals by removing the osd, recreating it with
>> a larger journal, and reinserting it?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
>>  wrote:
>> > Right ( but also was rebalancing cycle 2 day before pgs corrupted)
>> >
>> > 2015-08-21 2:23 GMT+03:00 Samuel Just :
>> >>
>> >> Specifically, the snap behavior (we already know that the pgs went
>> >> inconsistent while the pool was in writeback mode, right?).
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just  wrote:
>> >> > Yeah, I'm trying to confirm that the issues did happen in writeback
>> >> > mode.
>> >> > -Sam
>> >> >
>> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
>> >> >  wrote:
>> >> >> Right. But issues started...
>> >> >>
>> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>> >> >>>
>> >> >>> But that was still in writeback mode, right?
>> >> >>> -Sam
>> >> >>>
>> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>> >> >>>  wrote:
>> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data
>> >> >>> > initially
>> >> >>> > writes only to cache layer and not flushed at all to cold layer.
>> >> >>> >
>> >> >>> > Then we received notification from monitoring that we collect
>> about
>> >> >>> > 750GB in
>> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of
>> >> >>> > disk
>> >> >>> > size... And then evicting/flushing started...
>> >> >>> >
>> >> >>> > And issue with snapshots arrived
>> >> >>> >
>> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>> >> >>> >>
>> >> >>> >> Not sure what you mean by:
>> >> >>> >>
>> >> >>> >> but it's stop to work in same moment, when cache layer fulfilled
>> >> >>> >> with
>> >> >>> >> data and evict/flush started...
>> >> >>> >> -Sam
>> >> >>> >>
>> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>> >> >>> >>  wrote:
>> >> >>> >> > No, when we start draining cache - bad pgs was in place...
>> >> >>> >> > We have big rebalance (disk by disk - to change journal side
>> on
>> >> >>> >> > both
>> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived
>> scrub
>> >> >>> >> > errors
>> >> >>> >> > and 2
>> >> >>> >> > pgs inconsistent...
>> >> >>> >> >
>> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's
>> stop
>> >> >>> >> > to
>> >> >>> >> > work in
>> >> >>> >> > same moment, when cache layer fulfilled with data and
>> evict/flush
>> >> >>> >> > started...
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>> >> >>> >> >>
>> >> >>> >> >> So you started draining the cache pool before you saw either
>> the
>> >> >>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
>> >> >>> >> >> writeback
>> >> >>> >> >> mode was working correctly?)
>> >> >>> >> >> -Sam
>> >> >>> >> >>
>> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
>> >> >>> >> >>  wrote:
>> >> >>> >> >> > Good joke )
>> >> >>> >> >> >
>> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
>> >> >>> >> >> >>
>> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care
>> about
>> >> >>> >> >> >> :).
>> >> >>> >> >> >> -Sam
>> >> >>> >> >> >>
>> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
>> >> >>> >> >> >> 
>> >> >>> >> >> >> wrote:
>> >> >>> >> >> >> > What's supposed to happen is that the client
>> transparently
>> >> >>> >> >> >> > directs
>> >> >>> >> >> >> > all
>> >> >>> >> >> >> > requests to the cache pool rather than the cold pool
>> when
>> >> >>> >> >> >> > there
>> >> >>> >> >> >> > is
>> >> >>> >> >> >> > a
>> >> >>> >> >> >> > cache pool.  If the kernel is sending requests to the
>> cold
>> >> >>> >> >> >> > pool,
>> >> >>> >> >> >> > that's probably where the bug is.  Odd.  It could also
>> be a
>> >> >>> >> >> >> > bug
>> >> >>> >> >> >> > specific 'forward' mode either in the client or on the
>> osd.
>> >> >>> >> >> >> > Why
>> >> >>> >> >> >> > did
>> >> >>> >> >> >> > you have it in that mode?
>> >> >>> >> >> >> > -Sam
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
>> >> >>> >> >> >> >  wrote:
>> >> >>> >> >> >> >> We used 4.x branch, as we have "very good" Samsung 850
>> pro
>> >> >>> >> >> >> >> in
>> >> >>> >> >> >> >> production,
>> >> >>> >> >> >> >> and they don;t support ncq_trim...
>> >> >>> >> >> >> >>
>> >> >>> >> >> >> >> And 4,x first branch which include exceptions for this
>> in
>> >> >>> >> >> >> >> libsata.c.
>> >> >>> >> >> >> >>
>> >> >>> >> >> >> >> sure we can backport this 1 line to 3.x branch, but we
>> >> >>> >> >> >> >> p

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Will do, Sam!

thank in advance for you help!

2015-08-21 2:28 GMT+03:00 Samuel Just :

> Ok, create a ticket with a timeline and all of this information, I'll
> try to look into it more tomorrow.
> -Sam
>
> On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor
>  wrote:
> > Exactly
> >
> > пятница, 21 августа 2015 г. пользователь Samuel Just написал:
> >
> >> And you adjusted the journals by removing the osd, recreating it with
> >> a larger journal, and reinserting it?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
> >>  wrote:
> >> > Right ( but also was rebalancing cycle 2 day before pgs corrupted)
> >> >
> >> > 2015-08-21 2:23 GMT+03:00 Samuel Just :
> >> >>
> >> >> Specifically, the snap behavior (we already know that the pgs went
> >> >> inconsistent while the pool was in writeback mode, right?).
> >> >> -Sam
> >> >>
> >> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just 
> wrote:
> >> >> > Yeah, I'm trying to confirm that the issues did happen in writeback
> >> >> > mode.
> >> >> > -Sam
> >> >> >
> >> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
> >> >> >  wrote:
> >> >> >> Right. But issues started...
> >> >> >>
> >> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
> >> >> >>>
> >> >> >>> But that was still in writeback mode, right?
> >> >> >>> -Sam
> >> >> >>>
> >> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
> >> >> >>>  wrote:
> >> >> >>> > WE haven't set values for max_bytes / max_objects.. and all
> data
> >> >> >>> > initially
> >> >> >>> > writes only to cache layer and not flushed at all to cold
> layer.
> >> >> >>> >
> >> >> >>> > Then we received notification from monitoring that we collect
> >> >> >>> > about
> >> >> >>> > 750GB in
> >> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9
> of
> >> >> >>> > disk
> >> >> >>> > size... And then evicting/flushing started...
> >> >> >>> >
> >> >> >>> > And issue with snapshots arrived
> >> >> >>> >
> >> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
> >> >> >>> >>
> >> >> >>> >> Not sure what you mean by:
> >> >> >>> >>
> >> >> >>> >> but it's stop to work in same moment, when cache layer
> fulfilled
> >> >> >>> >> with
> >> >> >>> >> data and evict/flush started...
> >> >> >>> >> -Sam
> >> >> >>> >>
> >> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> >> >> >>> >>  wrote:
> >> >> >>> >> > No, when we start draining cache - bad pgs was in place...
> >> >> >>> >> > We have big rebalance (disk by disk - to change journal side
> >> >> >>> >> > on
> >> >> >>> >> > both
> >> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived
> >> >> >>> >> > scrub
> >> >> >>> >> > errors
> >> >> >>> >> > and 2
> >> >> >>> >> > pgs inconsistent...
> >> >> >>> >> >
> >> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's
> >> >> >>> >> > stop
> >> >> >>> >> > to
> >> >> >>> >> > work in
> >> >> >>> >> > same moment, when cache layer fulfilled with data and
> >> >> >>> >> > evict/flush
> >> >> >>> >> > started...
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
> >> >> >>> >> >>
> >> >> >>> >> >> So you started draining the cache pool before you saw
> either
> >> >> >>> >> >> the
> >> >> >>> >> >> inconsistent pgs or the anomalous snap behavior?  (That is,
> >> >> >>> >> >> writeback
> >> >> >>> >> >> mode was working correctly?)
> >> >> >>> >> >> -Sam
> >> >> >>> >> >>
> >> >> >>> >> >> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >> >> >>> >> >>  wrote:
> >> >> >>> >> >> > Good joke )
> >> >> >>> >> >> >
> >> >> >>> >> >> > 2015-08-21 2:06 GMT+03:00 Samuel Just  >:
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> Certainly, don't reproduce this with a cluster you care
> >> >> >>> >> >> >> about
> >> >> >>> >> >> >> :).
> >> >> >>> >> >> >> -Sam
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
> >> >> >>> >> >> >> 
> >> >> >>> >> >> >> wrote:
> >> >> >>> >> >> >> > What's supposed to happen is that the client
> >> >> >>> >> >> >> > transparently
> >> >> >>> >> >> >> > directs
> >> >> >>> >> >> >> > all
> >> >> >>> >> >> >> > requests to the cache pool rather than the cold pool
> >> >> >>> >> >> >> > when
> >> >> >>> >> >> >> > there
> >> >> >>> >> >> >> > is
> >> >> >>> >> >> >> > a
> >> >> >>> >> >> >> > cache pool.  If the kernel is sending requests to the
> >> >> >>> >> >> >> > cold
> >> >> >>> >> >> >> > pool,
> >> >> >>> >> >> >> > that's probably where the bug is.  Odd.  It could also
> >> >> >>> >> >> >> > be a
> >> >> >>> >> >> >> > bug
> >> >> >>> >> >> >> > specific 'forward' mode either in the client or on the
> >> >> >>> >> >> >> > osd.
> >> >> >>> >> >> >> > Why
> >> >> >>> >> >> >> > did
> >> >> >>> >> >> >> > you have it in that mode?
> >> >> >>> >> >> >> > -Sam
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >> >> >>> >> >> >> >  wrote:
> >> >> >>> >> >> >> >> We used 4.x branch, a

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Attachment blocked, so post as text...

root@zzz:~# cat update_osd.sh
#!/bin/bash

ID=$1
echo "Process OSD# ${ID}"

DEV=`mount | grep "ceph-${ID} " | cut -d " " -f 1`
echo "OSD# ${ID} hosted on ${DEV::-1}"

TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d " " -f 6`
if [ "${TYPE_RAW}" == "Solid" ]
then
TYPE="ssd"
elif [ "${TYPE_RAW}" == "7200" ]
then
TYPE="platter"
fi

echo "OSD Type = ${TYPE}"

HOST=`hostname`
echo "Current node hostname: ${HOST}"

echo "Set noout option for CEPH cluster"
ceph osd set noout

echo "Marked OSD # ${ID} out"
  [19/1857]
ceph osd out ${ID}

echo "Remove OSD # ${ID} from CRUSHMAP"
ceph osd crush remove osd.${ID}

echo "Delete auth for OSD# ${ID}"
ceph auth del osd.${ID}

echo "Stop OSD# ${ID}"
stop ceph-osd id=${ID}

echo "Remove OSD # ${ID} from cluster"
ceph osd rm ${ID}

echo "Unmount OSD# ${ID}"
umount ${DEV}

echo "ZAP ${DEV::-1}"
ceph-disk zap ${DEV::-1}

echo "Create new OSD with ${DEV::-1}"
ceph-disk-prepare ${DEV::-1}

echo "Activate new OSD"
ceph-disk-activate ${DEV}

echo "Dump current CRUSHMAP"
ceph osd getcrushmap -o cm.old

echo "Decompile CRUSHMAP"
crushtool -d cm.old -o cm

echo "Place new OSD in proper place"
sed -i "s/device${ID}/osd.${ID}/" cm
LINE=`cat -n cm | sed -n "/${HOST}-${TYPE} {/,/}/p" | tail -n 1 | awk
'{print $1}'`
sed -i "${LINE}iitem osd.${ID} weight 1.000" cm

echo "Modify ${HOST} weight into CRUSHMAP"
sed -i "s/item ${HOST}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight
1.000/" cm

echo "Compile new CRUSHMAP"
crushtool -c cm -o cm.new

echo "Inject new CRUSHMAP"
ceph osd setcrushmap -i cm.new

#echo "Clean..."
#rm -rf cm cm.new

echo "Unset noout option for CEPH cluster"
ceph osd unset noout

echo "OSD recreated... Waiting for rebalancing..."

2015-08-21 2:37 GMT+03:00 Voloshanenko Igor :

> As i we use journal collocation for journal now (because we want to
> utilize cache layer ((( ) i use ceph-disk to create new OSD (changed
> journal size on ceph.conf). I don;t prefer manual work))
>
> So create very simple script to update journal size
>
> 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor :
>
>> Exactly
>>
>> пятница, 21 августа 2015 г. пользователь Samuel Just написал:
>>
>> And you adjusted the journals by removing the osd, recreating it with
>>> a larger journal, and reinserting it?
>>> -Sam
>>>
>>> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
>>>  wrote:
>>> > Right ( but also was rebalancing cycle 2 day before pgs corrupted)
>>> >
>>> > 2015-08-21 2:23 GMT+03:00 Samuel Just :
>>> >>
>>> >> Specifically, the snap behavior (we already know that the pgs went
>>> >> inconsistent while the pool was in writeback mode, right?).
>>> >> -Sam
>>> >>
>>> >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just 
>>> wrote:
>>> >> > Yeah, I'm trying to confirm that the issues did happen in writeback
>>> >> > mode.
>>> >> > -Sam
>>> >> >
>>> >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
>>> >> >  wrote:
>>> >> >> Right. But issues started...
>>> >> >>
>>> >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>>> >> >>>
>>> >> >>> But that was still in writeback mode, right?
>>> >> >>> -Sam
>>> >> >>>
>>> >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
>>> >> >>>  wrote:
>>> >> >>> > WE haven't set values for max_bytes / max_objects.. and all data
>>> >> >>> > initially
>>> >> >>> > writes only to cache layer and not flushed at all to cold layer.
>>> >> >>> >
>>> >> >>> > Then we received notification from monitoring that we collect
>>> about
>>> >> >>> > 750GB in
>>> >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9 of
>>> >> >>> > disk
>>> >> >>> > size... And then evicting/flushing started...
>>> >> >>> >
>>> >> >>> > And issue with snapshots arrived
>>> >> >>> >
>>> >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
>>> >> >>> >>
>>> >> >>> >> Not sure what you mean by:
>>> >> >>> >>
>>> >> >>> >> but it's stop to work in same moment, when cache layer
>>> fulfilled
>>> >> >>> >> with
>>> >> >>> >> data and evict/flush started...
>>> >> >>> >> -Sam
>>> >> >>> >>
>>> >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
>>> >> >>> >>  wrote:
>>> >> >>> >> > No, when we start draining cache - bad pgs was in place...
>>> >> >>> >> > We have big rebalance (disk by disk - to change journal side
>>> on
>>> >> >>> >> > both
>>> >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - arrived
>>> scrub
>>> >> >>> >> > errors
>>> >> >>> >> > and 2
>>> >> >>> >> > pgs inconsistent...
>>> >> >>> >> >
>>> >> >>> >> > In writeback - yes, looks like snapshot works good. but it's
>>> stop
>>> >> >>> >> > to
>>> >> >>> >> > work in
>>> >> >>> >> > same moment, when cache layer fulfilled with data and
>>> evict/flush
>>> >> >>> >> > started...
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > 2015-08-21 2:09 GMT+03:00 Samuel Just :
>>> >> >>> >> >>
>>> >> >>> >> >> So you started draining the cache pool before you saw
>>> either the
>>> >> >>> >> >> inconsistent pgs 

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
It would help greatly if, on a disposable cluster, you could reproduce
the snapshot problem with

debug osd = 20
debug filestore = 20
debug ms = 1

on all of the osds and attach the logs to the bug report.  That should
make it easier to work out what is going on.
-Sam

On Thu, Aug 20, 2015 at 4:40 PM, Voloshanenko Igor
 wrote:
> Attachment blocked, so post as text...
>
> root@zzz:~# cat update_osd.sh
> #!/bin/bash
>
> ID=$1
> echo "Process OSD# ${ID}"
>
> DEV=`mount | grep "ceph-${ID} " | cut -d " " -f 1`
> echo "OSD# ${ID} hosted on ${DEV::-1}"
>
> TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d " " -f 6`
> if [ "${TYPE_RAW}" == "Solid" ]
> then
> TYPE="ssd"
> elif [ "${TYPE_RAW}" == "7200" ]
> then
> TYPE="platter"
> fi
>
> echo "OSD Type = ${TYPE}"
>
> HOST=`hostname`
> echo "Current node hostname: ${HOST}"
>
> echo "Set noout option for CEPH cluster"
> ceph osd set noout
>
> echo "Marked OSD # ${ID} out"
> [19/1857]
> ceph osd out ${ID}
>
> echo "Remove OSD # ${ID} from CRUSHMAP"
> ceph osd crush remove osd.${ID}
>
> echo "Delete auth for OSD# ${ID}"
> ceph auth del osd.${ID}
>
> echo "Stop OSD# ${ID}"
> stop ceph-osd id=${ID}
>
> echo "Remove OSD # ${ID} from cluster"
> ceph osd rm ${ID}
>
> echo "Unmount OSD# ${ID}"
> umount ${DEV}
>
> echo "ZAP ${DEV::-1}"
> ceph-disk zap ${DEV::-1}
>
> echo "Create new OSD with ${DEV::-1}"
> ceph-disk-prepare ${DEV::-1}
>
> echo "Activate new OSD"
> ceph-disk-activate ${DEV}
>
> echo "Dump current CRUSHMAP"
> ceph osd getcrushmap -o cm.old
>
> echo "Decompile CRUSHMAP"
> crushtool -d cm.old -o cm
>
> echo "Place new OSD in proper place"
> sed -i "s/device${ID}/osd.${ID}/" cm
> LINE=`cat -n cm | sed -n "/${HOST}-${TYPE} {/,/}/p" | tail -n 1 | awk
> '{print $1}'`
> sed -i "${LINE}iitem osd.${ID} weight 1.000" cm
>
> echo "Modify ${HOST} weight into CRUSHMAP"
> sed -i "s/item ${HOST}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight
> 1.000/" cm
>
> echo "Compile new CRUSHMAP"
> crushtool -c cm -o cm.new
>
> echo "Inject new CRUSHMAP"
> ceph osd setcrushmap -i cm.new
>
> #echo "Clean..."
> #rm -rf cm cm.new
>
> echo "Unset noout option for CEPH cluster"
> ceph osd unset noout
>
> echo "OSD recreated... Waiting for rebalancing..."
>
> 2015-08-21 2:37 GMT+03:00 Voloshanenko Igor :
>>
>> As i we use journal collocation for journal now (because we want to
>> utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal
>> size on ceph.conf). I don;t prefer manual work))
>>
>> So create very simple script to update journal size
>>
>> 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor :
>>>
>>> Exactly
>>>
>>> пятница, 21 августа 2015 г. пользователь Samuel Just написал:
>>>
 And you adjusted the journals by removing the osd, recreating it with
 a larger journal, and reinserting it?
 -Sam

 On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
  wrote:
 > Right ( but also was rebalancing cycle 2 day before pgs corrupted)
 >
 > 2015-08-21 2:23 GMT+03:00 Samuel Just :
 >>
 >> Specifically, the snap behavior (we already know that the pgs went
 >> inconsistent while the pool was in writeback mode, right?).
 >> -Sam
 >>
 >> On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just 
 >> wrote:
 >> > Yeah, I'm trying to confirm that the issues did happen in writeback
 >> > mode.
 >> > -Sam
 >> >
 >> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
 >> >  wrote:
 >> >> Right. But issues started...
 >> >>
 >> >> 2015-08-21 2:20 GMT+03:00 Samuel Just :
 >> >>>
 >> >>> But that was still in writeback mode, right?
 >> >>> -Sam
 >> >>>
 >> >>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
 >> >>>  wrote:
 >> >>> > WE haven't set values for max_bytes / max_objects.. and all
 >> >>> > data
 >> >>> > initially
 >> >>> > writes only to cache layer and not flushed at all to cold
 >> >>> > layer.
 >> >>> >
 >> >>> > Then we received notification from monitoring that we collect
 >> >>> > about
 >> >>> > 750GB in
 >> >>> > hot pool ) So i changed values for max_object_bytes to be 0,9
 >> >>> > of
 >> >>> > disk
 >> >>> > size... And then evicting/flushing started...
 >> >>> >
 >> >>> > And issue with snapshots arrived
 >> >>> >
 >> >>> > 2015-08-21 2:15 GMT+03:00 Samuel Just :
 >> >>> >>
 >> >>> >> Not sure what you mean by:
 >> >>> >>
 >> >>> >> but it's stop to work in same moment, when cache layer
 >> >>> >> fulfilled
 >> >>> >> with
 >> >>> >> data and evict/flush started...
 >> >>> >> -Sam
 >> >>> >>
 >> >>> >> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
 >> >>> >>  wrote:
 >> >>> >> > No, when we start draining cache - bad pgs was in place...
 >> >>> >> > We have big rebalance (disk by disk - to change journal side
 >> >>> >> > on
 >> >>> >> > both
 >> >>> >> > hot/cold layers).. All was Ok, but after 2 days - ar

Re: [ceph-users] Ceph OSD nodes in XenServer VMs

2015-08-20 Thread Steven McDonald
Hi Jiri,

On Thu, 20 Aug 2015 11:55:55 +1000
Jiri Kanicky  wrote:

> We are experimenting with an idea to run OSD nodes in XenServer VMs.
> We believe this could provide better flexibility, backups for the
> nodes etc.

Could you expand on this? As written, it seems like a bad idea to me,
just because you'd be adding complexity for no gain. Can you explain,
for instance, why you think it would enable better flexibility, or why
it would help with backups?

What is it that you intend to back up? Backing up the OS on a storage
node should never be necessary, since it should be recreatable from
config management, and backing up data on the OSDs is best done on a
per-pool basis because the requirements are going to differ by pool and
not by OSD.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread scott_tan...@yahoo.com
dear Loic:
I'm sorry to bother you.But I have a question about ceph.
I used PCIE-SSD to OSD disk . But I found it very bottom performance. 
I have two hosts, each host 1 PCIE-SSD,so i create two osd by PCIE-SSD.

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1   0.35999 root default 
-2   0.17999 host tds_node03 
0 0.17999  osd.0 up 1.0 1.0 
-30.17999host tds_node04 
1 0.17999  osd.1 up 1.0 1.0 

I create pool and rbd device.
I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS, I have 
tried many osd thread parameters, but not effect.
But i tested 8K randrw(70%) in single PCIE-SSD, it has 10W IOPS.

Is there any way to improve the PCIE-SSD  OSD performance?



scott_tan...@yahoo.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread Christian Balzer

Hello,

On Thu, 20 Aug 2015 15:47:46 +0800 scott_tan...@yahoo.com wrote:

The reason that you're not getting any replies is because we're not
psychic/telepathic/clairvoyant. 

Meaning that you're not giving us enough information by far.

> dear ALL:
> I used PCIE-SSD to OSD disk . But I found it very bottom
> performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd
> by PCIE-SSD.
> 
What PCIE-SDD? 
What hosts (HW, OS), network?
What Ceph version, config changes?

> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 
> -1   0.35999 root default 
> -2   0.17999 host tds_node03 
> 0 0.17999  osd.0 up 1.0 1.0 
> -30.17999host tds_node04 
> 1 0.17999  osd.1 up 1.0 1.0 
> 
> I create pool and rbd device.
What kind of pool, any non-default options?
Where did you mount/access that RBD device from, userspace, kernel?
What file system, if any?

> I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS,
Exact fio invocation parameters, output please. 
1W IOPS is supposed to mean 1 write IOPS? 
Also for comparison purposes, the "standard" is to test with 4KB blocks
for random access

> I have tried many osd thread parameters, but not effect. 
Unless your HW, SSD has issues defaults should give a lot better results 

>But i tested 8K
> randrw(70%) in single PCIE-SSD, it has 10W IOPS.
> 
10 write IOPS would still be abysmally slow. 
Single means running fio against the SSD directly?

How does this compare to using the exact same setup but HDDs or normal
SSDs?

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread scott_tan...@yahoo.com
my ceph.conf 

[global] 
auth_service_required = cephx 
osd_pool_default_size = 2 
filestore_xattr_use_omap = true 
auth_client_required = cephx 
auth_cluster_required = cephx 
mon_host = 172.168.2.171 
mon_initial_members = tds_node01 
fsid = fef619c4-5f4a-4bf1-a787-6c4d17995ec4 

keyvaluestore op threads = 4 
osd op threads = 4 
filestore op threads = 4 
osd disk threads = 2 
osd max write size = 180 
osd agent max ops = 8 

rbd readahead trigger requests = 20 
rbd readahead max bytes = 1048576 
rbd readahead disable after bytes = 104857600 

[mon.ceph_node01] 
host = ceph_node01 
mon addr = 172.168.2.171:6789 

[mon.ceph_node02] 
host = ceph_node02 
mon addr = 192.168.2.172:6789 

[mon.ceph_node03] 
host = ceph_node03 
mon addr = 192.168.2.171:6789 



[osd.0] 
host = ceph_node03 
deves = /dev/nvme0n1p5 


[osd.1] 
host = ceph_node04 
deves = /dev/nvme0n1p5
++

Even if I didn't adjust thread  parameters, performance result is the same。




scott_tan...@yahoo.com
 
From: Christian Balzer
Date: 2015-08-21 09:40
To: ceph-users
CC: scott_tan...@yahoo.com; liuxy666
Subject: Re: [ceph-users] PCIE-SSD OSD bottom performance issue
 
Hello,
 
On Thu, 20 Aug 2015 15:47:46 +0800 scott_tan...@yahoo.com wrote:
 
The reason that you're not getting any replies is because we're not
psychic/telepathic/clairvoyant. 
 
Meaning that you're not giving us enough information by far.
 
> dear ALL:
> I used PCIE-SSD to OSD disk . But I found it very bottom
> performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd
> by PCIE-SSD.
> 
What PCIE-SDD? 
What hosts (HW, OS), network?
What Ceph version, config changes?
 
> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 
> -1   0.35999 root default 
> -2   0.17999 host tds_node03 
> 0 0.17999  osd.0 up 1.0 1.0 
> -30.17999host tds_node04 
> 1 0.17999  osd.1 up 1.0 1.0 
> 
> I create pool and rbd device.
What kind of pool, any non-default options?
Where did you mount/access that RBD device from, userspace, kernel?
What file system, if any?
 
> I use fio test 8K randrw(70%) in rbd device,the result is only 1W IOPS,
Exact fio invocation parameters, output please. 
1W IOPS is supposed to mean 1 write IOPS? 
Also for comparison purposes, the "standard" is to test with 4KB blocks
for random access
 
> I have tried many osd thread parameters, but not effect. 
Unless your HW, SSD has issues defaults should give a lot better results 
 
>But i tested 8K
> randrw(70%) in single PCIE-SSD, it has 10W IOPS.
> 
10 write IOPS would still be abysmally slow. 
Single means running fio against the SSD directly?
 
How does this compare to using the exact same setup but HDDs or normal
SSDs?
 
Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Ilya Dryomov
On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just  wrote:
> What's supposed to happen is that the client transparently directs all
> requests to the cache pool rather than the cold pool when there is a
> cache pool.  If the kernel is sending requests to the cold pool,
> that's probably where the bug is.  Odd.  It could also be a bug
> specific 'forward' mode either in the client or on the osd.  Why did
> you have it in that mode?

I think I reproduced this on today's master.

Setup, cache mode is writeback:

$ ./ceph osd pool create foo 12 12
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
pool 'foo' created
$ ./ceph osd pool create foo-hot 12 12
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
pool 'foo-hot' created
$ ./ceph osd tier add foo foo-hot
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
pool 'foo-hot' is now (or already was) a tier of 'foo'
$ ./ceph osd tier cache-mode foo-hot writeback
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set cache-mode for pool 'foo-hot' to writeback
$ ./ceph osd tier set-overlay foo foo-hot
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
overlay for 'foo' is now (or already was) 'foo-hot'

Create an image:

$ ./rbd create --size 10M --image-format 2 foo/bar
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mkfs.ext4 /mnt/bar
$ sudo umount /mnt

Create a snapshot, take md5sum:

$ ./rbd snap create foo/bar@snap
$ ./rbd export foo/bar /tmp/foo-1
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-1
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-1
83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-1
$ md5sum /tmp/snap-1
83f5d244bb65eb19eddce0dc94bf6dda  /tmp/snap-1

Set the cache mode to forward and do a flush, hashes don't match - the
snap is empty - we bang on the hot tier and don't get redirected to the
cold tier, I suspect:

$ ./ceph osd tier cache-mode foo-hot forward
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set cache-mode for pool 'foo-hot' to forward
$ ./rados -p foo-hot cache-flush-evict-all
rbd_data.100a6b8b4567.0002
rbd_id.bar
rbd_directory
rbd_header.100a6b8b4567
bar.rbd
rbd_data.100a6b8b4567.0001
rbd_data.100a6b8b4567.
$ ./rados -p foo-hot cache-flush-evict-all
$ ./rbd export foo/bar /tmp/foo-2
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-2
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-2
83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-2
$ md5sum /tmp/snap-2
f1c9645dbc14efddc7d8a322685f26eb  /tmp/snap-2
$ od /tmp/snap-2
000 00 00 00 00 00 00 00 00
*
5000

Disable the cache tier and we are back to normal:

$ ./ceph osd tier remove-overlay foo
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
there is now (or already was) no overlay for 'foo'
$ ./rbd export foo/bar /tmp/foo-3
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-3
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-3
83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-3
$ md5sum /tmp/snap-3
83f5d244bb65eb19eddce0dc94bf6dda  /tmp/snap-3

I first reproduced it with the kernel client, rbd export was just to
take it out of the equation.


Also, Igor sort of raised a question in his second message: if, after
setting the cache mode to forward and doing a flush, I open an image
(not a snapshot, so may not be related to the above) for write (e.g.
with rbd-fuse), I get an rbd header object in the hot pool, even though
it's in forward mode:

$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mount /mnt/bar /media
$ sudo umount /media
$ sudo umount /mnt
$ ./rados -p foo-hot ls
rbd_header.100a6b8b4567
$ ./rados -p foo ls | grep rbd_header
rbd_header.100a6b8b4567

It's been a while since I looked into tiering, is that how it's
supposed to work?  It looks like it happens because rbd_header op
replies don't redirect?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados: Undefined symbol error

2015-08-20 Thread Aakanksha Pudipeddi-SSI
Hello,

I cloned the master branch of Ceph and after setting up the cluster, when I 
tried to use the rados commands, I got this error:

rados: symbol lookup error: rados: undefined symbol: 
_ZN5MutexC1ERKSsbbbP11CephContext

I saw a similar post here: http://tracker.ceph.com/issues/12563 but I am not 
clear on the solution for this problem. I am not performing an upgrade here but 
the error seems to be similar. Could anybody shed more light on the issue and 
how to solve it? Thanks a lot!

Aakanksha


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] НА: Question

2015-08-20 Thread Vickie ch
Hi ,
 I've do that before and when I try to write file into rbd. It's get
freeze.
​Beside resource, is there any other reason not recommend to combined mon
and osd? ​



Best wishes,
Mika


2015-08-18 15:52 GMT+08:00 Межов Игорь Александрович :

> Hi!
>
> You can run mons on the same hosts, though it is not recommemned. MON
> daemon
> itself are not resurce hungry - 1-2 cores and 2-4 Gb RAM is enough in most
> small
> installs. But there are some pitfalls:
> - MONs use LevelDB as a backstorage, and widely use direct write to ensure
> DB consistency.
> So, if MON daemon coexits with OSDs not only on the same host, but on the
> same
> volume/disk/controller - it will severily reduce disk io available to OSD,
> thus greatly
> reduce overall performance. Moving MONs root to separate spindle, or
> better - separate SSD
> will keep MONs running fine with OSDs at the same host.
> - When cluster is in healthy state, MONs are not resource consuming, but
> when cluster
> in "changing state" (adding/removing OSDs, backfiling, etc) the CPU and
> memory usage
> for MON can raise significantly.
>
> And yes, in small cluster, it is not alaways possible to get 3 separate
> hosts for MONs only.
>
>
> Megov Igor
> CIO, Yuterra
>
> --
> *От:* ceph-users  от имени Luis
> Periquito 
> *Отправлено:* 17 августа 2015 г. 17:09
> *Кому:* Kris Vaes
> *Копия:* ceph-users@lists.ceph.com
> *Тема:* Re: [ceph-users] Question
>
> yes. The issue is resource sharing as usual: the MONs will use disk I/O,
> memory and CPU. If the cluster is small (test?) then there's no problem in
> using the same disks. If the cluster starts to get bigger you may want to
> dedicate resources (e.g. the disk for the MONs isn't used by an OSD). If
> the cluster is big enough you may want to dedicate a node for being a MON.
>
> On Mon, Aug 17, 2015 at 2:56 PM, Kris Vaes  wrote:
>
>> Hi,
>>
>> Maybe this seems like a strange question but i could not find this info
>> in the docs , i have following question,
>>
>> For the ceph cluster you need osd daemons and monitor daemons,
>>
>> On a host you can run several osd daemons (best one per drive as read in
>> the docs) on one host
>>
>> But now my question  can you run on the same host where you run already
>> some osd daemons the monitor daemon
>>
>> Is this possible and what are the implications of doing this
>>
>>
>>
>> Met Vriendelijke Groeten
>> Cordialement
>> Kind Regards
>> Cordialmente
>> С приятелски поздрави
>>
>>
>> This message (including any attachments) may be privileged or
>> confidential. If you have received it by mistake, please notify the sender
>> by return e-mail and delete this message from your system. Any unauthorized
>> use or dissemination of this message in whole or in part is strictly
>> prohibited. S3S rejects any liability for the improper, incomplete or
>> delayed transmission of the information contained in this message, as well
>> as for damages resulting from this e-mail message. S3S cannot guarantee
>> that the message received by you has not been intercepted by third parties
>> and/or manipulated by computer programs used to transmit messages and
>> viruses.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Exact as in our case.

Ilya, same for images from our side. Headers opened from hot tier

пятница, 21 августа 2015 г. пользователь Ilya Dryomov написал:

> On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just  > wrote:
> > What's supposed to happen is that the client transparently directs all
> > requests to the cache pool rather than the cold pool when there is a
> > cache pool.  If the kernel is sending requests to the cold pool,
> > that's probably where the bug is.  Odd.  It could also be a bug
> > specific 'forward' mode either in the client or on the osd.  Why did
> > you have it in that mode?
>
> I think I reproduced this on today's master.
>
> Setup, cache mode is writeback:
>
> $ ./ceph osd pool create foo 12 12
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> pool 'foo' created
> $ ./ceph osd pool create foo-hot 12 12
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> pool 'foo-hot' created
> $ ./ceph osd tier add foo foo-hot
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> pool 'foo-hot' is now (or already was) a tier of 'foo'
> $ ./ceph osd tier cache-mode foo-hot writeback
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set cache-mode for pool 'foo-hot' to writeback
> $ ./ceph osd tier set-overlay foo foo-hot
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> overlay for 'foo' is now (or already was) 'foo-hot'
>
> Create an image:
>
> $ ./rbd create --size 10M --image-format 2 foo/bar
> $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
> $ sudo mkfs.ext4 /mnt/bar
> $ sudo umount /mnt
>
> Create a snapshot, take md5sum:
>
> $ ./rbd snap create foo/bar@snap
> $ ./rbd export foo/bar /tmp/foo-1
> Exporting image: 100% complete...done.
> $ ./rbd export foo/bar@snap /tmp/snap-1
> Exporting image: 100% complete...done.
> $ md5sum /tmp/foo-1
> 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-1
> $ md5sum /tmp/snap-1
> 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/snap-1
>
> Set the cache mode to forward and do a flush, hashes don't match - the
> snap is empty - we bang on the hot tier and don't get redirected to the
> cold tier, I suspect:
>
> $ ./ceph osd tier cache-mode foo-hot forward
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set cache-mode for pool 'foo-hot' to forward
> $ ./rados -p foo-hot cache-flush-evict-all
> rbd_data.100a6b8b4567.0002
> rbd_id.bar
> rbd_directory
> rbd_header.100a6b8b4567
> bar.rbd
> rbd_data.100a6b8b4567.0001
> rbd_data.100a6b8b4567.
> $ ./rados -p foo-hot cache-flush-evict-all
> $ ./rbd export foo/bar /tmp/foo-2
> Exporting image: 100% complete...done.
> $ ./rbd export foo/bar@snap /tmp/snap-2
> Exporting image: 100% complete...done.
> $ md5sum /tmp/foo-2
> 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-2
> $ md5sum /tmp/snap-2
> f1c9645dbc14efddc7d8a322685f26eb  /tmp/snap-2
> $ od /tmp/snap-2
> 000 00 00 00 00 00 00 00 00
> *
> 5000
>
> Disable the cache tier and we are back to normal:
>
> $ ./ceph osd tier remove-overlay foo
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> there is now (or already was) no overlay for 'foo'
> $ ./rbd export foo/bar /tmp/foo-3
> Exporting image: 100% complete...done.
> $ ./rbd export foo/bar@snap /tmp/snap-3
> Exporting image: 100% complete...done.
> $ md5sum /tmp/foo-3
> 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-3
> $ md5sum /tmp/snap-3
> 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/snap-3
>
> I first reproduced it with the kernel client, rbd export was just to
> take it out of the equation.
>
>
> Also, Igor sort of raised a question in his second message: if, after
> setting the cache mode to forward and doing a flush, I open an image
> (not a snapshot, so may not be related to the above) for write (e.g.
> with rbd-fuse), I get an rbd header object in the hot pool, even though
> it's in forward mode:
>
> $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
> $ sudo mount /mnt/bar /media
> $ sudo umount /media
> $ sudo umount /mnt
> $ ./rados -p foo-hot ls
> rbd_header.100a6b8b4567
> $ ./rados -p foo ls | grep rbd_header
> rbd_header.100a6b8b4567
>
> It's been a while since I looked into tiering, is that how it's
> supposed to work?  It looks like it happens because rbd_header op
> replies don't redirect?
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com