On Thu, May 28, 2015 at 12:22 AM, Christian Balzer <ch...@gol.com> wrote:
>
> Hello Greg,
>
> On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
>
>> The description of the logging abruptly ending and the journal being
>> bad really sounds like part of the disk is going back in time. I'm not
>> sure if XFS internally is set up in such a way that something like
>> losing part of its journal would allow that?
>>
> I'm special. ^o^
> No XFS, EXT4. As stated in the original thread, below.
> And the (OSD) journal is a raw partition on a DC S3700.
>
> And since there was at least a 30 seconds pause between the completion of
> the "/etc/init.d/ceph stop" and issuing of the shutdown command, the
> logging abruptly ending seems to be unlikely related to the shutdown at
> all.

Oh, sorry...
I happened to read this article last night:
http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/

Depending on configuration (I think you'd need to have a
journal-as-file) you could be experiencing that. And again, not many
people use ext4 so who knows what other ways there are of things being
broken that nobody else has seen yet.

>
>> If any of the OSD developers have the time it's conceivable a copy of
>> the OSD journal would be enlightening (if e.g. the header offsets are
>> wrong but there are a bunch of valid journal entries), but this is two
>> reports of this issue from you and none very similar from anybody
>> else. I'm still betting on something in the software or hardware stack
>> misbehaving. (There aren't that many people running Debian; there are
>> lots of people running Ubuntu and we find bad XFS kernels there not
>> infrequently; I think you're hitting something like that.)
>>
> There should be no file system involved with the raw partition SSD
> journal, n'est-ce pas?

...and I guess probably you aren't since you are using partitions.

>
> The hardware is vastly different, the previous case was on an AMD
> system with onboard SATA (SP5100), this one is a SM storage goat with LSI
> 3008.
>
> The only thing they have in common is the Ceph version 0.80.7 (via the
> Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
> (though there were minor updates on that between those incidents,
> backported fixes)
>
> A copy of the journal would consist of the entire 10GB partition, since we
> don't know where in loop it was at the time, right?

Yeah.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to