On Thu, May 28, 2015 at 12:22 AM, Christian Balzer <ch...@gol.com> wrote: > > Hello Greg, > > On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: > >> The description of the logging abruptly ending and the journal being >> bad really sounds like part of the disk is going back in time. I'm not >> sure if XFS internally is set up in such a way that something like >> losing part of its journal would allow that? >> > I'm special. ^o^ > No XFS, EXT4. As stated in the original thread, below. > And the (OSD) journal is a raw partition on a DC S3700. > > And since there was at least a 30 seconds pause between the completion of > the "/etc/init.d/ceph stop" and issuing of the shutdown command, the > logging abruptly ending seems to be unlikely related to the shutdown at > all.
Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. > >> If any of the OSD developers have the time it's conceivable a copy of >> the OSD journal would be enlightening (if e.g. the header offsets are >> wrong but there are a bunch of valid journal entries), but this is two >> reports of this issue from you and none very similar from anybody >> else. I'm still betting on something in the software or hardware stack >> misbehaving. (There aren't that many people running Debian; there are >> lots of people running Ubuntu and we find bad XFS kernels there not >> infrequently; I think you're hitting something like that.) >> > There should be no file system involved with the raw partition SSD > journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. > > The hardware is vastly different, the previous case was on an AMD > system with onboard SATA (SP5100), this one is a SM storage goat with LSI > 3008. > > The only thing they have in common is the Ceph version 0.80.7 (via the > Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 > (though there were minor updates on that between those incidents, > backported fixes) > > A copy of the journal would consist of the entire 10GB partition, since we > don't know where in loop it was at the time, right? Yeah. _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com