Do those logs have a higher debugging level than the default? If not
nevermind as they will not have enough information. If they do however,
we'd be interested in the portion around the moment you set the
tunables. Say, before the upgrade and a bit after you set the tunable.
If you want to be finer grained, then ideally it would be the moment
where those maps were created, but you'd have to grep the logs for that.
Or drop the logs somewhere and I'll take a look.
-Joao
On Jul 3, 2014 5:48 PM, "Pierre BLONDEAU" <pierre.blond...@unicaen.fr
<mailto:pierre.blond...@unicaen.fr>> wrote:
Le 03/07/2014 13:49, Joao Eduardo Luis a écrit :
On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote:
Le 03/07/2014 00:55, Samuel Just a écrit :
Ah,
~/logs » for i in 20 23; do ../ceph/src/osdmaptool
--export-crush
/tmp/crush$i osd-$i*; ../ceph/src/crushtool -d
/tmp/crush$i >
/tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d
../ceph/src/osdmaptool: osdmap file
'osd-20_osdmap.13258__0___4E62BB79__none'
../ceph/src/osdmaptool: exported crush map to
/tmp/crush20
../ceph/src/osdmaptool: osdmap file
'osd-23_osdmap.13258__0___4E62BB79__none'
../ceph/src/osdmaptool: exported crush map to
/tmp/crush23
6d5
< tunable chooseleaf_vary_r 1
Looks like the chooseleaf_vary_r tunable somehow ended
up divergent?
The only thing that comes to mind that could cause this is if we
changed
the leader's in-memory map, proposed it, it failed, and only the
leader
got to write the map to disk somehow. This happened once on a
totally
different issue (although I can't pinpoint right now which).
In such a scenario, the leader would serve the incorrect
osdmap to
whoever asked osdmaps from it, the remaining quorum would
serve the
correct osdmaps to all the others. This could cause this
divergence. Or
it could be something else.
Are there logs for the monitors for the timeframe this may have
happened
in?
Which exactly timeframe you want ? I have 7 days of logs, I should
have informations about the upgrade from firefly to 0.82.
Which mon's log do you want ? Three ?
Regards
-Joao
Pierre: do you recall how and when that got set?
I am not sure to understand, but if I good remember after
the update in
firefly, I was in state : HEALTH_WARN crush map has legacy
tunables and
I see "feature set mismatch" in log.
So if I good remeber, i do : ceph osd crush tunables optimal
for the
problem of "crush map" and I update my client and server
kernel to
3.16rc.
It's could be that ?
Pierre
-Sam
On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just
<sam.j...@inktank.com <mailto:sam.j...@inktank.com>>
wrote:
Yeah, divergent osdmaps:
555ed048e73024687fc8b106a570db__4f
osd-20_osdmap.13258__0___4E62BB79__none
6037911f31dc3c18b05499d24dcdbe__5c
osd-23_osdmap.13258__0___4E62BB79__none
Joao: thoughts?
-Sam
On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU
<pierre.blond...@unicaen.fr
<mailto:pierre.blond...@unicaen.fr>> wrote:
The files
When I upgrade :
ceph-deploy install --stable firefly
servers...
on each servers service ceph restart mon
on each servers service ceph restart osd
on each servers service ceph restart mds
I upgraded from emperor to firefly. After
repair, remap, replace,
etc ... I
have some PG which pass in peering state.
I thought why not try the version 0.82, it could
solve my problem. (
It's my mistake ). So, I upgrade from firefly to
0.83 with :
ceph-deploy install --testing servers...
..
Now, all programs are in version 0.82.
I have 3 mons, 36 OSD and 3 mds.
Pierre
PS : I find also
"inc\uosdmap.13258__0___469271DE__none" on
each meta
directory.
Le 03/07/2014 00:10, Samuel Just a écrit :
Also, what version did you upgrade from, and
how did you upgrade?
-Sam
On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just
<sam.j...@inktank.com
<mailto:sam.j...@inktank.com>>
wrote:
Ok, in current/meta on osd 20 and osd
23, please attach all files
matching
^osdmap.13258.*
There should be one such file on each
osd. (should look something
like
osdmap.6__0_FD6E4C01__none, probably
hashed into a subdirectory,
you'll want to use find).
What version of ceph is running on your
mons? How many mons do
you have?
-Sam
On Wed, Jul 2, 2014 at 2:21 PM, Pierre
BLONDEAU
<pierre.blond...@unicaen.fr
<mailto:pierre.blond...@unicaen.fr>>
wrote:
Hi,
I do it, the log files are available
here :
https://blondeau.users.greyc.__fr/cephlog/debug20/
<https://blondeau.users.greyc.fr/cephlog/debug20/>
The OSD's files are really big +/-
80M .
After starting the osd.20 some other
osd crash. I pass from 31
osd up to
16.
I remark that after this the number
of down+peering PG decrease
from 367
to
248. It's "normal" ? May be it's
temporary, the time that the
cluster
verifies all the PG ?
Regards
Pierre
Le 02/07/2014 19:16, Samuel Just a
écrit :
You should add
debug osd = 20
debug filestore = 20
debug ms = 1
to the [osd] section of the
ceph.conf and restart the
osds. I'd
like
all three logs if possible.
Thanks
-Sam
On Wed, Jul 2, 2014 at 5:03 AM,
Pierre BLONDEAU
<pierre.blond...@unicaen.fr
<mailto:pierre.blond...@unicaen.fr>>
wrote:
Yes, but how i do that ?
With a command like that ?
ceph tell osd.20 injectargs
'--debug-osd 20
--debug-filestore 20
--debug-ms
1'
By modify the
/etc/ceph/ceph.conf ? This
file is really poor
because I
use
udev detection.
When I have made these
changes, you want the three
log files or
only
osd.20's ?
Thank you so much for the
help
Regards
Pierre
Le 01/07/2014 23:51, Samuel
Just a écrit :
Can you reproduce with
debug osd = 20
debug filestore = 20
debug ms = 1
?
-Sam
On Tue, Jul 1, 2014 at
1:21 AM, Pierre BLONDEAU
<pierre.blond...@unicaen.fr
<mailto:pierre.blond...@unicaen.fr>>
wrote:
Hi,
I join :
- osd.20 is
one of osd that I
detect which makes
crash
other
OSD.
- osd.23 is
one of osd which
crash when i start
osd.20
- mds, is one
of my MDS
I cut log file
because they are to
big but. All is
here :
https://blondeau.users.greyc.__fr/cephlog/
<https://blondeau.users.greyc.fr/cephlog/>
Regards
Le 30/06/2014 17:35,
Gregory Farnum a
écrit :
What's the
backtrace from
the crashing
OSDs?
Keep in mind
that as a dev
release, it's
generally best
not to
upgrade
to unnamed
versions like
0.82 (but it's
probably too
late
to go
back
now).
I will remember it
the next time ;)
-Greg
Software
Engineer #42 @
http://inktank.com
|
http://ceph.com
On Mon, Jun 30,
2014 at 8:06 AM,
Pierre BLONDEAU
<pierre.blond...@unicaen.fr
<mailto:pierre.blond...@unicaen.fr>>
wrote:
Hi,
After the
upgrade to
firefly, I
have some PG
in peering
state.
I seen the
output of
0.82 so I
try to
upgrade for
solved my
problem.
My three MDS
crash and
some OSD
triggers a
chain
reaction
that
kills
other
OSD.
I think my
MDS will not
start
because of
the
metadata are
on the
OSD.
I have 36
OSD on three
servers and
I identified
5 OSD which
makes
crash
others. If i
not start
their, the
cluster
passe in
reconstructive
state
with
31 OSD but i
have 378 in
down+peering
state.
How can I do
? Would you
more
information
( os,
crash log,
etc ...
)
?
Regards
--
------------------------------__----------------
Pierre BLONDEAU
Administrateur Systèmes & réseaux
Université de Caen
Laboratoire GREYC, Département
d'informatique
tel : 02 31 56 75 42
bureau : Campus 2, Science 3, 406
------------------------------__----------------
--
------------------------------__----------------
Pierre BLONDEAU
Administrateur Systèmes & réseaux
Université de Caen
Laboratoire GREYC, Département d'informatique
tel : 02 31 56 75 42
bureau : Campus 2, Science 3, 406
------------------------------__----------------
--
------------------------------__----------------
Pierre BLONDEAU
Administrateur Systèmes & réseaux
Université de Caen
Laboratoire GREYC, Département d'informatique
tel : 02 31 56 75 42
bureau : Campus 2, Science 3, 406
------------------------------__----------------