s were replicated across all three, with
the hope that this sort of thing would not be fatal. It's a Jewel system with
that version's default of 1 for "mon osd min down reporters".
--
Bryan Henderson San Jose, California
__
. However, I'll
bet the people who buy those are not aware that it's designed never to go down
and if something breaks while the system is coming up, a repair action may be
necessary before data is accessible again.
--
Bryan Henderson
onitor cluster. Is that possible?
A related question: If I mark an OSD down administratively, does it stay down
until I give a command to mark it back up, or will the monitor detect signs of
life and declare it up again on its own?
--
Bryan Henderson
SDs and report that
to the monitor, which would believe it within about a minute and mark the OSDs
down. ("osd heartbeat interval", "mon osd min down reports", "mon osd min down
reporters", "osd reporter subtree level&qu
get marked down, which is
pretty complicated, at
http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/
It just doesn't seem to match the implementation.
--
Bryan Henderson San Jose, California
_
efault value of mon_osd_report_timeout),
it marks it down. But it didn't. I did "osd down" commands for the dead OSDs
and the status changed to down and I/O started working.
And wouldn't even 15 minutes of grace be unacceptable if it means I/Os have to
wait that long before falling
d by the rsize and wsize mount options. Without such options, in
the one case I tried, Linux 4.9, blocksize was 32K. Maybe it's affected by
the server or by the filesystem the NFS server is serving. This was NFS 3.
> This patch should address this issue [massive reads of e.g. /dev/urand
e for
the file, which is an aspect of the file's layout. In the default layout,
stripe unit size is 4 MiB.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
d wipes out the entropy pool.
Has stat block size been discussed much? Is there a good reason that it's
the RADOS object size?
I'm thinking of modifying the cephfs filesystem driver to add a mount option
to specify a fixed block size to be reported for all files, and using 4K or
Is it possible to search the mailing list archives?
http://lists.ceph.com/pipermail/ceph-users-ceph.com/
seems to have a search function, but in my experience never finds anything.
--
Bryan Henderson San Jose, California
s it isn't empty, while also giving an empty list of its
contents.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
incapable of
hosting that log. But I found the filesystem driver is the same way - I have
to tell it how big a write it can do; it can't figure it out from the OSDs.
So maybe its a fundamental architecture thing.
--
Bryan Henderson San Jose, California
___
s now? Is this a job for
cephfs-journal-tool event recover_dentries
cephfs-journal-tool journal reset
?
This is Jewel.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
red: MDS_ALL_DOWN (was: 1 filesystem is offline)
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> If the active MDS is connected to a monitor and they fail at the same time,
> the monitors can't replace the mds until they've been through their own
> election and a full mds timeout window.
So how long are we talking?
--
Bryan Henderson S
write never happened.
This failure to restart happened after the MDS crashed, and I lost any
messages that would tell me why it crashed. I'll fix that and turn up
verbosity and if it happens again, I'll have a better idea how the zeroes got
there.
--
Bryan Henders
y or incorrectly written?
I'm looking at this because I have an MDS that will not start because there
is junk (zeroes) in that space after where the log header says the log ends,
so replay of the log fails there.
--
Bryan Henderson
broken OSD belong on another
OSD (which I guess it ought to, since the OSD is out), ceph-objecstore-tool is
what you would use to move them over there manually, since ordinary peering
can't do it.
--
Bryan Henderson
root filesystem for
these clients.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
many more bugs in the 3.16 cephfs
filesystem driver waiting for me. Indeed, I've seen panics not yet explained.
So what are other people using? A less stable kernel? An out-of-tree driver?
FUSE? Is there a working process for getting known bugs fixed in 3.16?
--
Bryan Hend
> Kill all mds first , create new fs with old pools , then run ‘fs reset’
> before start any MDS.
Brilliant! I can't wait to try it.
Thanks.
--
Bryan Henderson San Jose, California
___
ceph-users mailin
et' does, but without expecting anything to be there already. Maybe that's
all it takes along with 'ceph-objecstore-tool --op update-mon-db' to recover
from a lost cluster map.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ce you've recovered access to the OSDs?
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
uests are from inside the cluster), and
the requests aren't just blocked for a long time; they're blocked
indefinitely. The only time I've seen it is when I brought the cluster up in
a different order than I usually do. So I'm just trying to understa
I recently had some requests blocked indefinitely; I eventually cleared it
up by recycling the OSDs, but I'd like some help interpreting the log messages
that supposedly give clue as to what caused the blockage:
(I reformatted for easy email reading)
2018-05-03 01:56:35.248623 osd.0 192.168.1.16:
My cluster got stuck somehow, and at one point in trying to recycle things to
unstick it, I ended up shutting down everything, then bringing up just the
monitors. At that point, the cluster reported the status below.
With nothing but the monitors running, I don't see how the status can say
there
risk
would I be taking if I just haphazardly killed everything instead of
orchestrating a shutdown?
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinf
than that, and what happens if the maximum I set is too low
to cover those necessesary old pgmaps?
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/lis
g ceph_kvstore_tool after shutting down the monitor, I see hundreds of
keys.
So what does the monitor have to store to do a "status" command?
I've seen clues that the activity has to do with Paxos elections, but I'm
fuzzy on why elections would be happening or why they would nee
hought this might be interesting to someone searching the archives for
memory usage information.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ress space rlimits. It's the best I can do; there is no real memory
or paging rate rlimit. As it stands, any normal shell on my systems has an
address space limit of 256M, which has never been a problem before, but is
majorly inconvenient now.
--
Brya
n't matter what specific command I'm doing and it does this even with
there is no ceph cluster running, so it must be something pretty basic.
--
Bryan Henderson San Jose, California
___
ceph-users mailing list
32 matches
Mail list logo