Hi Tom, a few things you can check into. Some of these depend on how many
OSDs you¹re trying to run on a single chassis.

# up PIDs, otherwise you may run out of the ability to spawn new threads

# up available mem for sudden bursts, like during benchmarking
Vm.min_free_kbytes = <something reasonable, like 2GB>

In ceph.conf:

max_open_files = <32K or more>

# make sure you have enough ephemeral port range for the number of OSDs
Ms bind port min = 6800
Ms bind port max = 9000

You may need to up your network tuning as well, but it¹s less likely to
cause these sorts of problems. Watch your netstat -s for clues.

Warren Wang

On 9/12/16, 12:44 PM, "ceph-users on behalf of Deneau, Tom"
<ceph-users-boun...@lists.ceph.com on behalf of tom.den...@amd.com> wrote:

>Trying to understand why some OSDs (6 out of 21) went down in my cluster
>while running a CBT radosbench benchmark.  From the logs below, is this a
>networking problem between systems, or is it some kind of FileStore
>Looking at one crashed OSD log, I see the following crash error:
>2016-09-09 21:30:29.757792 7efc6f5f1700 -1 FileStore: sync_entry timed
>out after 600 seconds.
> ceph version 10.2.1-13.el7cp (f15ca93643fee5f7d32e62c3e8a7016c1fc1e6f4)
>just before that I see things like:
>2016-09-09 21:18:07.391760 7efc755fd700 -1 osd.12 165 heartbeat_check: no
>reply from osd.6 since back 2016-09-09 21:17:47.261601 front 2016-09-09
>21:17:47.261601 (cutoff 2016-09-09 21:17:47.391758)
>and also
>2016-09-09 19:03:45.788327 7efc53905700  0 -- >>
> pipe(0x7efc8bfbc800 sd=65 :52000 s=1 pgs=12 cs=1 l=0\
> c=0x7efc8bef5b00).connect got RESETSESSION
>and many warnings for slow requests.
>All the other osds that died seem to have died with:
>2016-09-09 19:11:01.663262 7f2157e65700 -1 common/HeartbeatMap.cc: In
>function 'bool ceph::HeartbeatMap::_check(const
>ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f2157e65700 time
>2016-09-09 19:11:01.660671
>common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
>-- Tom Deneau, AMD
>ceph-users mailing list

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
ceph-users mailing list

Reply via email to