Hi, at this night I had same issue on Hammer LTS.
I think that this is a ceph bug. 

My history: 

Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
Distro: Debian 7 (Proxmox 3.4)
Kernel: 2.6.32-39-pve

We have 9x 6TB SAS Drives in main pool and 6x 128GB PCIe SSD in cache
pool on 3 nodes in same box.

Long time cache pool worked in writeback mode. But we got a poor
response of the SSD drives and 100% utilisation, because we decide try
to switch cache pool to readonly mode.

First, I removed writeback cache as shown here:
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache

Second, I fully removed cache pool and created new with same parameters
and assigned same crush_ruleset, but cache-mode in readonly. Everything
was ok, vm's booted, but i / o was still bad. SSDs were still remained
bottle neck.
Then I tried to disable the cache with this command:

 ceph osd tier cache-mode cache_pool none

And all, i/o has stopped. Fully!

Restart of virtual machines, revealed damage to the file system and
start fsck and chkfs at boot. Many vm's started well, although after a
partial loss of data.
Then I as you thought that some of the data simply somehow in the cache
pool. And I turned back cache to readonly mode. Then when I realized
that it does not help, I again disable it, and remove it completely as
written here:
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-read-only-cache
But it only made things worse. The file system of virtual machines has
been damaged to such an extent that they have not been booting properly
and reports an any errors of data corruption. Recovering from a snapshot
is not helped.

Now I can say for sure that removing readonly cache in my configuration
causes data corruption :( 

Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11: 

Hi, this night I had same issue on Hammer LTS. 

Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
Distro: Debian 7 (Proxmox 3.4)
Kernel: 2.6.32-39-pve 

We have: 9x 6TB SAS Drives in main pool and 6x 128Gb PCIe SSD in cache
pool on 3 nodes in same box 

Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11: 

Hi, 

I have a ceph cluster as the nova backend storage, and I enabled the
cache tier with readonly cache-mode for the nova_pool, now the nova
instance cannot boot after remove the nova_pool cache tier, 

The instance show the error is "boot failed:not a bootable disk" 

I used the below command to remove the cache tier which refer to the
ceph document 

ceph osd tier cache-mode cache_pool none 

ceph osd tier remove nova_pool cache_pool 

when I perform the troubleshooting, I found that some images existed in
nova_cache but not found in nova_pool, so it seems that the cache
pool(nove_cache) work with writeback mode before, why ? 

I confirmed that I set it with readonly mode before,what is wrong ? 

And if there is any way can fix the instance boot issue ? 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] 

-- 

Квапил Андрей
+7 966 05 666 50 

> Hi, 
> 
> I have a ceph cluster as the nova backend storage, and I enabled the cache 
> tier with readonly cache-mode for the nova_pool, now the nova instance cannot 
> boot after remove the nova_pool cache tier, 
> 
> The instance show the error is "boot failed:not a bootable disk" 
> 
> I used the below command to remove the cache tier which refer to the ceph 
> document 
> 
> ceph osd tier cache-mode cache_pool none 
> 
> ceph osd tier remove nova_pool cache_pool 
> 
> when I perform the troubleshooting, I found that some images existed in 
> nova_cache but not found in nova_pool, so it seems that the cache 
> pool(nove_cache) work with writeback mode before, why ? 
> 
> I confirmed that I set it with readonly mode before,what is wrong ? 
> 
> And if there is any way can fix the instance boot issue ? 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1]

WE HAVE: 9X 6TB SAS DRIVES IN MAIN POOL AND 6X 128 PCIE SSD IN CACHE
POOL ON SAME BOXES IN 3 NODES 

у нас есть: 1 6tb SAS дисков в главный бассейн и 6 128 PCIe SSD в кэш
бассейн на же коробки в 3 узлов 
-- 

Квапил Андрей
+7 966 05 666 50 

Links:
------
[1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to