Le 03/07/2014 00:55, Samuel Just a écrit :
Ah,~/logs » for i in 20 23; do ../ceph/src/osdmaptool --export-crush /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d /tmp/crush$i > /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d ../ceph/src/osdmaptool: osdmap file 'osd-20_osdmap.13258__0_4E62BB79__none' ../ceph/src/osdmaptool: exported crush map to /tmp/crush20 ../ceph/src/osdmaptool: osdmap file 'osd-23_osdmap.13258__0_4E62BB79__none' ../ceph/src/osdmaptool: exported crush map to /tmp/crush23 6d5 < tunable chooseleaf_vary_r 1 Looks like the chooseleaf_vary_r tunable somehow ended up divergent? Pierre: do you recall how and when that got set?
I am not sure to understand, but if I good remember after the update in firefly, I was in state : HEALTH_WARN crush map has legacy tunables and I see "feature set mismatch" in log.
So if I good remeber, i do : ceph osd crush tunables optimal for the problem of "crush map" and I update my client and server kernel to 3.16rc.
It's could be that ? Pierre
-Sam On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just <sam.j...@inktank.com> wrote:Yeah, divergent osdmaps: 555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_4E62BB79__none 6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_4E62BB79__none Joao: thoughts? -Sam On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU <pierre.blond...@unicaen.fr> wrote:The files When I upgrade : ceph-deploy install --stable firefly servers... on each servers service ceph restart mon on each servers service ceph restart osd on each servers service ceph restart mds I upgraded from emperor to firefly. After repair, remap, replace, etc ... I have some PG which pass in peering state. I thought why not try the version 0.82, it could solve my problem. ( It's my mistake ). So, I upgrade from firefly to 0.83 with : ceph-deploy install --testing servers... .. Now, all programs are in version 0.82. I have 3 mons, 36 OSD and 3 mds. Pierre PS : I find also "inc\uosdmap.13258__0_469271DE__none" on each meta directory. Le 03/07/2014 00:10, Samuel Just a écrit :Also, what version did you upgrade from, and how did you upgrade? -Sam On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.j...@inktank.com> wrote:Ok, in current/meta on osd 20 and osd 23, please attach all files matching ^osdmap.13258.* There should be one such file on each osd. (should look something like osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory, you'll want to use find). What version of ceph is running on your mons? How many mons do you have? -Sam On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU <pierre.blond...@unicaen.fr> wrote:Hi, I do it, the log files are available here : https://blondeau.users.greyc.fr/cephlog/debug20/ The OSD's files are really big +/- 80M . After starting the osd.20 some other osd crash. I pass from 31 osd up to 16. I remark that after this the number of down+peering PG decrease from 367 to 248. It's "normal" ? May be it's temporary, the time that the cluster verifies all the PG ? Regards Pierre Le 02/07/2014 19:16, Samuel Just a écrit :You should add debug osd = 20 debug filestore = 20 debug ms = 1 to the [osd] section of the ceph.conf and restart the osds. I'd like all three logs if possible. Thanks -Sam On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU <pierre.blond...@unicaen.fr> wrote:Yes, but how i do that ? With a command like that ? ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20 --debug-ms 1' By modify the /etc/ceph/ceph.conf ? This file is really poor because I use udev detection. When I have made these changes, you want the three log files or only osd.20's ? Thank you so much for the help Regards Pierre Le 01/07/2014 23:51, Samuel Just a écrit :Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 ? -Sam On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU <pierre.blond...@unicaen.fr> wrote:Hi, I join : - osd.20 is one of osd that I detect which makes crash other OSD. - osd.23 is one of osd which crash when i start osd.20 - mds, is one of my MDS I cut log file because they are to big but. All is here : https://blondeau.users.greyc.fr/cephlog/ Regards Le 30/06/2014 17:35, Gregory Farnum a écrit :What's the backtrace from the crashing OSDs? Keep in mind that as a dev release, it's generally best not to upgrade to unnamed versions like 0.82 (but it's probably too late to go back now).I will remember it the next time ;)-Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU <pierre.blond...@unicaen.fr> wrote:Hi, After the upgrade to firefly, I have some PG in peering state. I seen the output of 0.82 so I try to upgrade for solved my problem. My three MDS crash and some OSD triggers a chain reaction that kills other OSD. I think my MDS will not start because of the metadata are on the OSD. I have 36 OSD on three servers and I identified 5 OSD which makes crash others. If i not start their, the cluster passe in reconstructive state with 31 OSD but i have 378 in down+peering state. How can I do ? Would you more information ( os, crash log, etc ... ) ? Regards-- ---------------------------------------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ------------------------------------------------ ---------------------------------------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ----------------------------------------------
-- ---------------------------------------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ----------------------------------------------
smime.p7s
Description: Signature cryptographique S/MIME
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com