Hello,
Le 24/10/2017 à 07:49, Brad Hubbard a écrit :
On Mon, Oct 23, 2017 at 4:51 PM, pascal.pu...@pci-conseil.net
<mailto:pascal.pu...@pci-conseil.net> <pascal.pu...@pci-conseil.net
<mailto:pascal.pu...@pci-conseil.net>> wrote:
Hello,
Le 23/10/2017 à 02:05, Brad Hubbard a écrit :
2017-10-22 17:32:56.031086 7f3acaff5700 1 osd.14 pg_epoch: 72024
pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023
n=13 ec=7037 les/c/f 72023/72023/66447 72022/72022/72022)
[14,1,41] r=0 lpr=72022 crt=71593'41657 lcod 0'
0 mlcod 0'0 active+clean] hit_set_trim
37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31
01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head not found
2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc:
In function 'void
ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&, unsigned
int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105
osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
It appears to be looking for (and failing to find) a hitset
object with a timestamp from August? Does that sound right to
you? Of course, it appears an object for that timestamp does not
exist.
How is-it possible ? How to fix it. I am sure, if I run a lot of
read, other objects like this will crash other osd.
(Cluster is OK now, I will probably destroy OSD 14 and recreate it).
How to find this object ?
You should be able to do a find on the OSDs filestore and grep the
output for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs
responsible for pg 37.1c and then move on to the others if it's feasible.
So with grep, I found OSD.14 (already destroyed anb recreated) and OSD.1.
ceph-osd-01: /var/log/ceph/ceph-osd.1.log-20171019.gz:2017-10-18
05:37:52.793802 7f9754ec5700 -1 osd.1 pg_epoch: 71592 pg[37.1c( v
71591'41652 (60849'38594,71591'41652] local-les=71583 n=17 ec=7037
les/c/f 71583/71554/66447 71561/71578/71578) [43,26,13]/[1,41] r=0
lpr=71578 pi=71553-71577/5 luod=71590'41651 bft=13,26,43 crt=71588'41647
lcod 71589'41650 mlcod 0'0
active+undersized+degraded+remapped+wait_backfill] agent_load_hit_sets:
could not load hitset
37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31
01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head
May I destroy OSD 1 and recreate it as well to force move ? or just
reweight OSD to force move ?
How to find other objects with same issues ? (just restart rsync and see ?).
Other question :I use to run a night crontab with fstrim on rbd disk.
Is-it is it because of the problem ?
Let us know the results.
--
*Performance Conseil Informatique*
Pascal Pucci
Consultant Infrastructure
pascal.pu...@pci-conseil.net <mailto:pascal.pu...@pci-conseil.net>
Mobile : 06 51 47 84 98
Bureau : 02 85 52 41 81
http://www.performance-conseil-informatique.net /*News :*
Parteneriat DataCore -PCI est Silver Partner
<http://www.performance-conseil-informatique.net/2017/06/02/partenaire-datacore/>
Très heureux de réaliser des projets continuité stockage avec DataCore
depuis 2008. PCI est partenaire Silver DataCore. Merci à DataCore
...lire...I
<http://www.performance-conseil-informatique.net/2017/06/02/partenaire-datacore/>
/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com