Good news, while I wrote the previous letter I found the solution, to
recovery back my vm's:

 ceph osd tier remove cold-storage

I've been thinking how it can affect what happened. But I still do not
understand why overlay option has so strange behavior.
I know that overlay option sets overlay of the storage pool, so that all
the IOs are now routing to the cache pool.

I am guessing that it works the same for readonly cache as for the
writeback cache.
That is forwards all requests to read and write in the cache pool, while
write requests it must send to the main pool, not cache.

Now when the overlay is turned off, the cache pool is not used by
ceph... 

Квапил писал 2016-02-12 15:02: 

> Hi, at this night I had same issue on Hammer LTS.
> I think that this is a ceph bug. 
> 
> My history: 
> 
> Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
> Distro: Debian 7 (Proxmox 3.4)
> Kernel: 2.6.32-39-pve
> 
> We have 9x 6TB SAS Drives in main pool and 6x 128GB PCIe SSD in cache pool on 
> 3 nodes in same box.
> 
> Long time cache pool worked in writeback mode. But we got a poor response of 
> the SSD drives and 100% utilisation, because we decide try to switch cache 
> pool to readonly mode.
> 
> First, I removed writeback cache as shown here:
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
> 
> Second, I fully removed cache pool and created new with same parameters and 
> assigned same crush_ruleset, but cache-mode in readonly. Everything was ok, 
> vm's booted, but i / o was still bad. SSDs were still remained bottle neck.
> Then I tried to disable the cache with this command:
> 
> ceph osd tier cache-mode cache_pool none
> 
> And all, i/o has stopped. Fully!
> 
> Restart of virtual machines, revealed damage to the file system and start 
> fsck and chkfs at boot. Many vm's started well, although after a partial loss 
> of data.
> Then I as you thought that some of the data simply somehow in the cache pool. 
> And I turned back cache to readonly mode. Then when I realized that it does 
> not help, I again disable it, and remove it completely as written here:
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-read-only-cache
> But it only made things worse. The file system of virtual machines has been 
> damaged to such an extent that they have not been booting properly and 
> reports an any errors of data corruption. Recovering from a snapshot is not 
> helped.
> 
> Now I can say for sure that removing readonly cache in my configuration 
> causes data corruption :( 
> 
> Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11: 
> 
> Hi, this night I had same issue on Hammer LTS. 
> 
> Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
> Distro: Debian 7 (Proxmox 3.4)
> Kernel: 2.6.32-39-pve 
> 
> We have: 9x 6TB SAS Drives in main pool and 6x 128Gb PCIe SSD in cache pool 
> on 3 nodes in same box 
> 
> Xiangyu (Raijin, BP&IT Dept) писал 2015-09-25 12:11: 
> 
> Hi, 
> 
> I have a ceph cluster as the nova backend storage, and I enabled the cache 
> tier with readonly cache-mode for the nova_pool, now the nova instance cannot 
> boot after remove the nova_pool cache tier, 
> 
> The instance show the error is "boot failed:not a bootable disk" 
> 
> I used the below command to remove the cache tier which refer to the ceph 
> document 
> 
> ceph osd tier cache-mode cache_pool none 
> 
> ceph osd tier remove nova_pool cache_pool 
> 
> when I perform the troubleshooting, I found that some images existed in 
> nova_cache but not found in nova_pool, so it seems that the cache 
> pool(nove_cache) work with writeback mode before, why ? 
> 
> I confirmed that I set it with readonly mode before,what is wrong ? 
> 
> And if there is any way can fix the instance boot issue ? 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] 
> 
> -- 
> 
> Квапил Андрей
> +7 966 05 666 50 
> 
>> Hi, 
>> 
>> I have a ceph cluster as the nova backend storage, and I enabled the cache 
>> tier with readonly cache-mode for the nova_pool, now the nova instance 
>> cannot boot after remove the nova_pool cache tier, 
>> 
>> The instance show the error is "boot failed:not a bootable disk" 
>> 
>> I used the below command to remove the cache tier which refer to the ceph 
>> document 
>> 
>> ceph osd tier cache-mode cache_pool none 
>> 
>> ceph osd tier remove nova_pool cache_pool 
>> 
>> when I perform the troubleshooting, I found that some images existed in 
>> nova_cache but not found in nova_pool, so it seems that the cache 
>> pool(nove_cache) work with writeback mode before, why ? 
>> 
>> I confirmed that I set it with readonly mode before,what is wrong ? 
>> 
>> And if there is any way can fix the instance boot issue ? 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1]
> 
> WE HAVE: 9X 6TB SAS DRIVES IN MAIN POOL AND 6X 128 PCIE SSD IN CACHE POOL ON 
> SAME BOXES IN 3 NODES 
> 
> у нас есть: 1 6tb SAS дисков в главный бассейн и 6 128 PCIe SSD в кэш бассейн 
> на же коробки в 3 узлов 
> -- 
> 
> Квапил Андрей
> +7 966 05 666 50 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1]

-- 

Квапил Андрей
+7 966 05 666 50 

Links:
------
[1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to