Re: [ceph-users] How does monitor know OSD is dead?

2019-07-03 Thread Bryan Henderson
s were replicated across all three, with the hope that this sort of thing would not be fatal. It's a Jewel system with that version's default of 1 for "mon osd min down reporters". -- Bryan Henderson San Jose, California __

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-02 Thread Bryan Henderson
. However, I'll bet the people who buy those are not aware that it's designed never to go down and if something breaks while the system is coming up, a repair action may be necessary before data is accessible again. -- Bryan Henderson

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-01 Thread Bryan Henderson
onitor cluster. Is that possible? A related question: If I mark an OSD down administratively, does it stay down until I give a command to mark it back up, or will the monitor detect signs of life and declare it up again on its own? -- Bryan Henderson

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-29 Thread Bryan Henderson
SDs and report that to the monitor, which would believe it within about a minute and mark the OSDs down. ("osd heartbeat interval", "mon osd min down reports", "mon osd min down reporters", "osd reporter subtree level&qu

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-29 Thread Bryan Henderson
get marked down, which is pretty complicated, at http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ It just doesn't seem to match the implementation. -- Bryan Henderson San Jose, California _

[ceph-users] How does monitor know OSD is dead?

2019-06-27 Thread Bryan Henderson
efault value of mon_osd_report_timeout), it marks it down. But it didn't. I did "osd down" commands for the dead OSDs and the status changed to down and I/O started working. And wouldn't even 15 minutes of grace be unacceptable if it means I/Os have to wait that long before falling

Re: [ceph-users] cephfs file block size: must it be so big?

2018-12-14 Thread Bryan Henderson
d by the rsize and wsize mount options. Without such options, in the one case I tried, Linux 4.9, blocksize was 32K. Maybe it's affected by the server or by the filesystem the NFS server is serving. This was NFS 3. > This patch should address this issue [massive reads of e.g. /dev/urand

Re: [ceph-users] cephfs file block size: must it be so big?

2018-12-14 Thread Bryan Henderson
e for the file, which is an aspect of the file's layout. In the default layout, stripe unit size is 4 MiB. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs file block size: must it be so big?

2018-12-13 Thread Bryan Henderson
d wipes out the entropy pool. Has stat block size been discussed much? Is there a good reason that it's the RADOS object size? I'm thinking of modifying the cephfs filesystem driver to add a mount option to specify a fixed block size to be reported for all files, and using 4K or

[ceph-users] searching mailing list archives

2018-11-12 Thread Bryan Henderson
Is it possible to search the mailing list archives? http://lists.ceph.com/pipermail/ceph-users-ceph.com/ seems to have a search function, but in my experience never finds anything. -- Bryan Henderson San Jose, California

[ceph-users] How to repair rstats mismatch

2018-11-08 Thread Bryan Henderson
s it isn't empty, while also giving an empty list of its contents. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Should OSD write error result in damaged filesystem?

2018-11-04 Thread Bryan Henderson
incapable of hosting that log. But I found the filesystem driver is the same way - I have to tell it how big a write it can do; it can't figure it out from the OSDs. So maybe its a fundamental architecture thing. -- Bryan Henderson San Jose, California ___

[ceph-users] Should OSD write error result in damaged filesystem?

2018-11-03 Thread Bryan Henderson
s now? Is this a job for cephfs-journal-tool event recover_dentries cephfs-journal-tool journal reset ? This is Jewel. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] MDS does not always failover to hot standby

2018-09-07 Thread Bryan Henderson
red: MDS_ALL_DOWN (was: 1 filesystem is offline) -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-01 Thread Bryan Henderson
> If the active MDS is connected to a monitor and they fail at the same time, > the monitors can't replace the mds until they've been through their own > election and a full mds timeout window. So how long are we talking? -- Bryan Henderson S

Re: [ceph-users] Why does Ceph probe for end of MDS log?

2018-08-26 Thread Bryan Henderson
write never happened. This failure to restart happened after the MDS crashed, and I lost any messages that would tell me why it crashed. I'll fix that and turn up verbosity and if it happens again, I'll have a better idea how the zeroes got there. -- Bryan Henders

[ceph-users] Why does Ceph probe for end of MDS log?

2018-08-23 Thread Bryan Henderson
y or incorrectly written? I'm looking at this because I have an MDS that will not start because there is junk (zeroes) in that space after where the log header says the log ends, so replay of the log fails there. -- Bryan Henderson

Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-04 Thread Bryan Henderson
broken OSD belong on another OSD (which I guess it ought to, since the OSD is out), ceph-objecstore-tool is what you would use to move them over there manually, since ordinary peering can't do it. -- Bryan Henderson

Re: [ceph-users] Cephfs kernel driver availability

2018-07-22 Thread Bryan Henderson
root filesystem for these clients. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cephfs kernel driver availability

2018-07-22 Thread Bryan Henderson
many more bugs in the 3.16 cephfs filesystem driver waiting for me. Indeed, I've seen panics not yet explained. So what are other people using? A less stable kernel? An out-of-tree driver? FUSE? Is there a working process for getting known bugs fixed in 3.16? -- Bryan Hend

Re: [ceph-users] Data recovery after loosing all monitors

2018-06-02 Thread Bryan Henderson
> Kill all mds first , create new fs with old pools , then run ‘fs reset’ > before start any MDS. Brilliant! I can't wait to try it. Thanks. -- Bryan Henderson San Jose, California ___ ceph-users mailin

Re: [ceph-users] Data recovery after loosing all monitors

2018-06-01 Thread Bryan Henderson
et' does, but without expecting anything to be there already. Maybe that's all it takes along with 'ceph-objecstore-tool --op update-mon-db' to recover from a lost cluster map. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Data recovery after loosing all monitors

2018-05-26 Thread Bryan Henderson
ce you've recovered access to the OSDs? -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intepreting reason for blocked request

2018-05-19 Thread Bryan Henderson
uests are from inside the cluster), and the requests aren't just blocked for a long time; they're blocked indefinitely. The only time I've seen it is when I brought the cluster up in a different order than I usually do. So I'm just trying to understa

[ceph-users] Intepreting reason for blocked request

2018-05-12 Thread Bryan Henderson
I recently had some requests blocked indefinitely; I eventually cleared it up by recycling the OSDs, but I'd like some help interpreting the log messages that supposedly give clue as to what caused the blockage: (I reformatted for easy email reading) 2018-05-03 01:56:35.248623 osd.0 192.168.1.16:

[ceph-users] stale status from monitor?

2018-05-08 Thread Bryan Henderson
My cluster got stuck somehow, and at one point in trying to recycle things to unstick it, I ended up shutting down everything, then bringing up just the monitors. At that point, the cluster reported the status below. With nothing but the monitors running, I don't see how the status can say there

[ceph-users] Shutting down: why OSDs first?

2018-05-07 Thread Bryan Henderson
risk would I be taking if I just haphazardly killed everything instead of orchestrating a shutdown? -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinf

[ceph-users] Why keep old epochs?

2017-11-14 Thread Bryan Henderson
than that, and what happens if the maximum I set is too low to cover those necessesary old pgmaps? -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/lis

[ceph-users] What goes in the monitor database?

2017-11-04 Thread Bryan Henderson
g ceph_kvstore_tool after shutting down the monitor, I see hundreds of keys. So what does the monitor have to store to do a "status" command? I've seen clues that the activity has to do with Paxos elections, but I'm fuzzy on why elections would be happening or why they would nee

[ceph-users] Ceph program memory usage

2017-04-29 Thread Bryan Henderson
hought this might be interesting to someone searching the archives for memory usage information. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph program uses lots of memory

2017-01-03 Thread Bryan Henderson
ress space rlimits. It's the best I can do; there is no real memory or paging rate rlimit. As it stands, any normal shell on my systems has an address space limit of 256M, which has never been a problem before, but is majorly inconvenient now. -- Brya

[ceph-users] ceph program uses lots of memory

2016-12-29 Thread Bryan Henderson
n't matter what specific command I'm doing and it does this even with there is no ceph cluster running, so it must be something pretty basic. -- Bryan Henderson San Jose, California ___ ceph-users mailing list