Hi, There is any chance to restore my data ?
Regards Pierre Le 07/07/2014 15:42, Pierre BLONDEAU a écrit :
No chance to have those logs and even less in debug mode. I do this change 3 weeks ago. I put all my log here if it's can help : https://blondeau.users.greyc.fr/cephlog/all/ I have a chance to recover my +/- 20TB of data ? Regards Le 03/07/2014 21:48, Joao Luis a écrit :Do those logs have a higher debugging level than the default? If not nevermind as they will not have enough information. If they do however, we'd be interested in the portion around the moment you set the tunables. Say, before the upgrade and a bit after you set the tunable. If you want to be finer grained, then ideally it would be the moment where those maps were created, but you'd have to grep the logs for that. Or drop the logs somewhere and I'll take a look. -Joao On Jul 3, 2014 5:48 PM, "Pierre BLONDEAU" <pierre.blond...@unicaen.fr <mailto:pierre.blond...@unicaen.fr>> wrote: Le 03/07/2014 13:49, Joao Eduardo Luis a écrit : On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote: Le 03/07/2014 00:55, Samuel Just a écrit : Ah, ~/logs » for i in 20 23; do ../ceph/src/osdmaptool --export-crush /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d /tmp/crush$i > /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d ../ceph/src/osdmaptool: osdmap file 'osd-20_osdmap.13258__0___4E62BB79__none' ../ceph/src/osdmaptool: exported crush map to /tmp/crush20 ../ceph/src/osdmaptool: osdmap file 'osd-23_osdmap.13258__0___4E62BB79__none' ../ceph/src/osdmaptool: exported crush map to /tmp/crush23 6d5 < tunable chooseleaf_vary_r 1 Looks like the chooseleaf_vary_r tunable somehow ended up divergent? The only thing that comes to mind that could cause this is if we changed the leader's in-memory map, proposed it, it failed, and only the leader got to write the map to disk somehow. This happened once on a totally different issue (although I can't pinpoint right now which). In such a scenario, the leader would serve the incorrect osdmap to whoever asked osdmaps from it, the remaining quorum would serve the correct osdmaps to all the others. This could cause this divergence. Or it could be something else. Are there logs for the monitors for the timeframe this may have happened in? Which exactly timeframe you want ? I have 7 days of logs, I should have informations about the upgrade from firefly to 0.82. Which mon's log do you want ? Three ? Regards -Joao Pierre: do you recall how and when that got set? I am not sure to understand, but if I good remember after the update in firefly, I was in state : HEALTH_WARN crush map has legacy tunables and I see "feature set mismatch" in log. So if I good remeber, i do : ceph osd crush tunables optimal for the problem of "crush map" and I update my client and server kernel to 3.16rc. It's could be that ? Pierre -Sam On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just <sam.j...@inktank.com <mailto:sam.j...@inktank.com>> wrote: Yeah, divergent osdmaps: 555ed048e73024687fc8b106a570db__4f osd-20_osdmap.13258__0___4E62BB79__none 6037911f31dc3c18b05499d24dcdbe__5c osd-23_osdmap.13258__0___4E62BB79__none Joao: thoughts? -Sam On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU <pierre.blond...@unicaen.fr <mailto:pierre.blond...@unicaen.fr>> wrote: The files When I upgrade : ceph-deploy install --stable firefly servers... on each servers service ceph restart mon on each servers service ceph restart osd on each servers service ceph restart mds I upgraded from emperor to firefly. After repair, remap, replace, etc ... I have some PG which pass in peering state. I thought why not try the version 0.82, it could solve my problem. ( It's my mistake ). So, I upgrade from firefly to 0.83 with : ceph-deploy install --testing servers... .. Now, all programs are in version 0.82. I have 3 mons, 36 OSD and 3 mds. Pierre PS : I find also "inc\uosdmap.13258__0___469271DE__none" on each meta directory. Le 03/07/2014 00:10, Samuel Just a écrit : Also, what version did you upgrade from, and how did you upgrade? -Sam On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.j...@inktank.com <mailto:sam.j...@inktank.com>> wrote: Ok, in current/meta on osd 20 and osd 23, please attach all files matching ^osdmap.13258.* There should be one such file on each osd. (should look something like osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory, you'll want to use find). What version of ceph is running on your mons? How many mons do you have? -Sam On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU <pierre.blond...@unicaen.fr <mailto:pierre.blond...@unicaen.fr>> wrote: Hi, I do it, the log files are available here : https://blondeau.users.greyc.__fr/cephlog/debug20/ <https://blondeau.users.greyc.fr/cephlog/debug20/> The OSD's files are really big +/- 80M . After starting the osd.20 some other osd crash. I pass from 31 osd up to 16. I remark that after this the number of down+peering PG decrease from 367 to 248. It's "normal" ? May be it's temporary, the time that the cluster verifies all the PG ? Regards Pierre Le 02/07/2014 19:16, Samuel Just a écrit : You should add debug osd = 20 debug filestore = 20 debug ms = 1 to the [osd] section of the ceph.conf and restart the osds. I'd like all three logs if possible. Thanks -Sam On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU <pierre.blond...@unicaen.fr <mailto:pierre.blond...@unicaen.fr>> wrote: Yes, but how i do that ? With a command like that ? ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20 --debug-ms 1' By modify the /etc/ceph/ceph.conf ? This file is really poor because I use udev detection. When I have made these changes, you want the three log files or only osd.20's ? Thank you so much for the help Regards Pierre Le 01/07/2014 23:51, Samuel Just a écrit : Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 ? -Sam On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU <pierre.blond...@unicaen.fr <mailto:pierre.blond...@unicaen.fr>> wrote: Hi, I join : - osd.20 is one of osd that I detect which makes crash other OSD. - osd.23 is one of osd which crash when i start osd.20 - mds, is one of my MDS I cut log file because they are to big but. All is here : https://blondeau.users.greyc.__fr/cephlog/ <https://blondeau.users.greyc.fr/cephlog/> Regards Le 30/06/2014 17:35, Gregory Farnum a écrit : What's the backtrace from the crashing OSDs? Keep in mind that as a dev release, it's generally best not to upgrade to unnamed versions like 0.82 (but it's probably too late to go back now). I will remember it the next time ;) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU <pierre.blond...@unicaen.fr <mailto:pierre.blond...@unicaen.fr>> wrote: Hi, After the upgrade to firefly, I have some PG in peering state. I seen the output of 0.82 so I try to upgrade for solved my problem. My three MDS crash and some OSD triggers a chain reaction that kills other OSD. I think my MDS will not start because of the metadata are on the OSD. I have 36 OSD on three servers and I identified 5 OSD which makes crash others. If i not start their, the cluster passe in reconstructive state with 31 OSD but i have 378 in down+peering state. How can I do ? Would you more information ( os, crash log, etc ... ) ? Regards -- ------------------------------__---------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ------------------------------__---------------- -- ------------------------------__---------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ------------------------------__---------------- -- ------------------------------__---------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ------------------------------__----------------_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- ---------------------------------------------- Pierre BLONDEAU Administrateur Systèmes & réseaux Université de Caen Laboratoire GREYC, Département d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ----------------------------------------------
smime.p7s
Description: Signature cryptographique S/MIME
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com