[ceph-users] Simultaneous CEPH OSD crashes

2015-09-27 Thread Lionel Bouton
Hi,

we just had a quasi simultaneous crash on two different OSD which
blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9.

the first OSD to go down had this error :

2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function
'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27
06:30:33.145251
os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
|| got != -5)

the second OSD crash was similar :

2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function
'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27
06:30:57.260978
os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
|| got != -5)

I'm familiar with this error : it happened already with a BTRFS read
error (invalid csum) and I could correct it after flush-journal/deleting
the corrupted file/starting OSD/pg repair.
This time though there isn't any kernel log indicating an invalid csum.
The kernel is different though : we use 3.18.9 on these two servers and
the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors
with this version. I've launched btrfs scrub on the 2 filesystems just
in case (still waiting for completion).

The first attempt to restart these OSDs failed: one OSD died 19 seconds
after start, the other 21 seconds. Seeing that, I temporarily brought
down the min_size to 1 which allowed the 9 incomplete PG to recover. I
verified this by bringing min_size again to 2 and then restarted the 2
OSDs. They didn't crash yet.

For reference the assert failures were still the same when the OSD died
shortly after start :
2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function
'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27
08:20:19.325126
os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
|| got != -5)

2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function
'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27
08:20:50.605234
os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
|| got != -5)

Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving
one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with
currently a 10 minute interval), this might be relevant (or just a
coincidence).

I made copies of the ceph osd logs (including the stack trace and the
recent events) if needed.

Can anyone put some light on why these OSDs died ?

Best regards,

Lionel Bouton
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simultaneous CEPH OSD crashes

2015-09-27 Thread Lionel Bouton
Le 27/09/2015 09:15, Lionel Bouton a écrit :
> Hi,
>
> we just had a quasi simultaneous crash on two different OSD which
> blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9.
>
> the first OSD to go down had this error :
>
> 2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
> size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27
> 06:30:33.145251
> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
> || got != -5)
>
> the second OSD crash was similar :
>
> 2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
> size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27
> 06:30:57.260978
> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
> || got != -5)
>
> I'm familiar with this error : it happened already with a BTRFS read
> error (invalid csum) and I could correct it after flush-journal/deleting
> the corrupted file/starting OSD/pg repair.
> This time though there isn't any kernel log indicating an invalid csum.
> The kernel is different though : we use 3.18.9 on these two servers and
> the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors
> with this version. I've launched btrfs scrub on the 2 filesystems just
> in case (still waiting for completion).
>
> The first attempt to restart these OSDs failed: one OSD died 19 seconds
> after start, the other 21 seconds. Seeing that, I temporarily brought
> down the min_size to 1 which allowed the 9 incomplete PG to recover. I
> verified this by bringing min_size again to 2 and then restarted the 2
> OSDs. They didn't crash yet.
>
> For reference the assert failures were still the same when the OSD died
> shortly after start :
> 2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
> size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27
> 08:20:19.325126
> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
> || got != -5)
>
> 2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
> size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27
> 08:20:50.605234
> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
> || got != -5)
>
> Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving
> one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with
> currently a 10 minute interval), this might be relevant (or just a
> coincidence).
>
> I made copies of the ceph osd logs (including the stack trace and the
> recent events) if needed.
>
> Can anyone put some light on why these OSDs died ?

I just had a thought. Could launching a defragmentation on a file in a
BTRFS OSD filestore trigger this problem?
We have a process doing just that. It waits until there's no recent
access to queue files for defragmentation but there's no guarantee that
it will not defragment a file an OSD is about to use.
This might explain the nearly simultaneous crash as the defragmentation
is triggered by write access patterns which should be the roughly the
same on all 3 OSDs hosting a copy of the file. The defragmentation isn't
running at the exact same time because it is queued which could explain
why we got 2 crashes instead of 3.

I'll probably ask on linux-btrfs but the possible conditions leading to
this assert failure would help pinpoint the problem, so if someone knows
this code well enough without knowing how BTRFS behaves while
defragmenting I'll bridge the gap.

I just activated autodefrag on one of the two affected servers for all
its BTRFS filesystems and disabled our own defragmentation process.
With recent tunings we might not need our own defragmentation scheduler
anymore and we can afford to lose some performance while investigating this.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Teuthology Integration to native openstack

2015-09-27 Thread Bharath Krishna
Hi,

We have an openstack deployment in place with CEPH as CINDER backend.

We would like to perform functional testing for CEPH and found teuthology as 
recommended option.

Have successfully installed teuthology. Now to integrate it with Openstack, I 
could see that the possible providers could be either OVH, REDHAT or 
ENTERCLOUDSITE.

Is there any option where in we can source openstack deployment of our own and 
test CEPH using teuthology?

If NO, please suggest on how to test CEPH in such scenarios?

Please help.

Thank you.
Bharath Krishna
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS "corruption" -- Nulled bytes

2015-09-27 Thread Adam Tygart
I've done some digging into cp and mv's semantics (from coreutils). If
the inode is existing, the file will get truncated, then data will get
copied in. This is definitely within the scope of the bug above.

--
Adam

On Fri, Sep 25, 2015 at 8:08 PM, Adam Tygart  wrote:
> It may have been. Although the timestamp on the file was almost a
> month ago. The typical workflow for this particular file is to copy an
> updated version overtop of it.
>
> i.e. 'cp qss kstat'
>
> I'm not sure if cp semantics would keep the same inode and simply
> truncate/overwrite the contents, or if it would do an unlink and then
> create a new file.
> --
> Adam
>
> On Fri, Sep 25, 2015 at 8:00 PM, Ivo Jimenez  wrote:
>> Looks like you might be experiencing this bug:
>>
>>   http://tracker.ceph.com/issues/12551
>>
>> Fix has been merged to master and I believe it'll be part of infernalis. The
>> original reproducer involved truncating/overwriting files. In your example,
>> do you know if 'kstat' has been truncated/overwritten prior to generating
>> the md5sums?
>>
>> On Fri, Sep 25, 2015 at 2:11 PM Adam Tygart  wrote:
>>>
>>> Hello all,
>>>
>>> I've run into some sort of bug with CephFS. Client reads of a
>>> particular file return nothing but 40KB of Null bytes. Doing a rados
>>> level get of the inode returns the whole file, correctly.
>>>
>>> Tested via Linux 4.1, 4.2 kernel clients, and the 0.94.3 fuse client.
>>>
>>> Attached is a dynamic printk debug of the ceph module from the linux
>>> 4.2 client while cat'ing the file.
>>>
>>> My current thought is that there has to be a cache of the object
>>> *somewhere* that a 'rados get' bypasses.
>>>
>>> Even on hosts that have *never* read the file before, it is returning
>>> Null bytes from the kernel and fuse mounts.
>>>
>>> Background:
>>>
>>> 24x CentOS 7.1 hosts serving up RBD and CephFS with Ceph 0.94.3.
>>> CephFS is a EC k=8, m=4 pool with a size 3 writeback cache in front of it.
>>>
>>> # rados -p cachepool get 10004096b95. /tmp/kstat-cache
>>> # rados -p ec84pool get 10004096b95. /tmp/kstat-ec
>>> # md5sum /tmp/kstat*
>>> ddfbe886420a2cb860b46dc70f4f9a0d  /tmp/kstat-cache
>>> ddfbe886420a2cb860b46dc70f4f9a0d  /tmp/kstat-ec
>>> # file /tmp/kstat*
>>> /tmp/kstat-cache: Perl script, ASCII text executable
>>> /tmp/kstat-ec:Perl script, ASCII text executable
>>>
>>> # md5sum ~daveturner/bin/kstat
>>> 1914e941c2ad5245a23e3e1d27cf8fde  /homes/daveturner/bin/kstat
>>> # file ~daveturner/bin/kstat
>>> /homes/daveturner/bin/kstat: data
>>>
>>> Thoughts?
>>>
>>> Any more information you need?
>>>
>>> --
>>> Adam
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com