an updated version with 20GB heap (top reports 24GB used)

would starting more MDS and setting them active help? is there any guideline paper wrt the impact of multiple active MDS and the number of files in the tree? or some rule of thumb wrt memory, number of active MDS and files/directories in the tree?

thanks,

stijn


On 04/02/2014 08:58 PM, Stijn De Weirdt wrote:
wow. kudos for integrating this in ceph. more projects should do it like
that!

anyway, in attachement a gzipped ps file. heap is at 4.4GB, top reports
6.5GB mem usage.

care to point out what to look for? i'll send a new one when the usage
is starting to cause swapping.

thanks,

stijn

On 04/02/2014 06:35 PM, Gregory Farnum wrote:
Did you see
http://ceph.com/docs/master/rados/troubleshooting/memory-profiling?
That should have what you need to get started, although you'll also
need to learn the basics of using the heap analysis tools elsewhere.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 9:23 AM, Stijn De Weirdt
<stijn.dewei...@ugent.be> wrote:
hi,


1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?

anyway to extract this from the mds without having to start it? as it
was an rsync operation, i can try to locate possible candidates on the
source filesystem, but what would be considered "large"?

total number of files 13M, spread over 800k directories, but it's
unclear
how far the sync was at time of failing. i've not found a good way to
for
directories with lots of files and/or subdirs.



2) Use tcmalloc's heap analyzer to see where all the memory is being
allocated.

we'll giv ethat a try

i run ceph-mds with HEAPCHECK=normal (via the init script), but how
can we
"stop" mds without killing it? the heapchecker only seems to dump at
the end
of a run, maybe there's a way to have intermediate dump like
valgrind, but
the documentation is not very helpful.

stijn



3) Look through the logs for when the beacon fails (the first of
"mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and see
if there's anything tell-tale going on at the time.


anything in particular we should be looking for?

the log goes as follows:
mds starts around 11:43
...

2014-04-01 11:44:23.658583 7ffec89c6700  1 mds.0.server
reconnect_clients -- 1 sessions
2014-04-01 11:44:41.212488 7ffec89c6700  0 log [DBG] : reconnect by
client.4585 10.141.8.199:0/3551 after 17.553854
2014-04-01 11:44:45.692237 7ffec89c6700  1 mds.0.10 reconnect_done
2014-04-01 11:44:45.996384 7ffec89c6700  1 mds.0.10 handle_mds_map i
am now mds.0.10
2014-04-01 11:44:45.996388 7ffec89c6700  1 mds.0.10 handle_mds_map
state change up:reconnect --> up:rejoin
2014-04-01 11:44:45.996390 7ffec89c6700  1 mds.0.10 rejoin_start
2014-04-01 11:49:53.158471 7ffec89c6700  1 mds.0.10 rejoin_joint_start


then lots (4667 lines) of

2014-04-01 11:50:10.237035 7ffebc844700  0 -- 10.141.8.180:6837/55117
10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 cs=0
l=1 c=0x6513e8840).connect got RESETSESSION but no longer connecting


with one intermediate

2014-04-01 11:51:50.181354 7ffebcf4b700  0 -- 10.141.8.180:6837/55117
10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 l=1
c=0xc77d5ee0).fault



then sudden change

2014-04-01 11:57:30.440554 7ffebcd49700  0 -- 10.141.8.180:6837/55117
10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 cs=0
l=1 c=0xd99b11e0).connect got RESETSESSION but no longer connecting
2014-04-01 11:57:30.722607 7ffebec68700  0 -- 10.141.8.180:6837/55117
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0
l=1 c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722669 7ffebec68700  0 -- 10.141.8.180:6837/55117
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0
l=1 c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722885 7ffebec68700  0 -- 10.141.8.180:6837/55117
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722945 7ffebec68700  0 -- 10.141.8.180:6837/55117
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1
c=0xc48a3f40).connect got BADAUTHORIZER


followed by lots of

2014-04-01 11:57:30.738562 7ffebec68700  0 -- 10.141.8.180:6837/55117
10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 l=1
c=0x10e5d280).fault


with sporadic

2014-04-01 11:57:32.431219 7ffebeb67700  0 -- 10.141.8.180:6837/55117
10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 cs=0
l=1 c=0x130590dc0).connect got RESETSESSION but no longer connecting



until the dmup

2014-04-01 11:59:27.612850 7ffebea66700  0 -- 10.141.8.180:6837/55117
10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 l=1
c=0xa7be300).fault
2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In function
'void CDir::_omap_fetched(ceph::bufferlist&,
std::map<std::basic_string<char, std::char_traits<char>,
std::allocator\
<char> >, ceph::buffer::list, std::less<std::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::pair<const std::basic_string<char,
std::char_trait\
s<char>, std::allocator<char> >, ceph::buffer::list> > >&, const
std::string&, int)' thread 7ffec89c6700 time 2014-04-01
11:59:27.620684
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)

  ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
  1: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string,
ceph::buffer::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::list> >
>&, st\
d::string const&, int)+0x4d71) [0x77c3c1]
  2: (Context::complete(int)+0x9) [0x56bb79]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6]
  4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
  5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
  6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
  7: (DispatchQueue::entry()+0x582) [0x902072]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
  9: /lib64/libpthread.so.0() [0x34c36079d1]
  10: (clone()+0x6d) [0x34c32e8b6d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
-10000> 2014-04-01 11:59:27.137779 7ffec89c6700  5 mds.0.10 initiating
monitor reconnect; maybe we're not the slow one
  -9999> 2014-04-01 11:59:27.137787 7ffec89c6700 10
monclient(hunting):
_reopen_session rank -1 name
  -9998> 2014-04-01 11:59:27.137790 7ffec89c6700  1 --
10.141.8.180:6837/55117 mark_down 0x1326be460 -- 0x2cae5f00
  -9997> 2014-04-01 11:59:27.137807 7ffebf672700  2 --
10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57
:0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect error 10.141.8.1\
81:6789/0, 104: Connection reset by peer
  -9996> 2014-04-01 11:59:27.137819 7ffebf672700  3 --
10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57
:0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect fault, but state\
  = closed != connecting, stopping
  -9995> 2014-04-01 11:59:27.137824 7ffec89c6700 10
monclient(hunting):
picked mon.ceph003 con 0x3af54ba0 addr 10.141.8.182:6789/0
  -9994> 2014-04-01 11:59:27.137832 7ffec89c6700 10
monclient(hunting):
_send_mon_message to mon.ceph003 at 10.141.8.182:6789/0






-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:


----- Message from Gregory Farnum <g...@inktank.com> ---------
     Date: Tue, 1 Apr 2014 09:03:17 -0700
     From: Gregory Farnum <g...@inktank.com>

Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
       To: "Yan, Zheng" <uker...@gmail.com>
       Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, ceph-users
<ceph-users@lists.ceph.com>



On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <uker...@gmail.com>
wrote:


On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:


After some more searching, I've found that the source of the
problem is
with
the mds and not the mon.. The mds crashes, generates a core
dump that
eats
the local space, and in turn the monitor (because of leveldb)
crashes.

The error in the mds log of one host:

2014-04-01 15:46:34.414615 7f870e319700  0 --
10.141.8.180:6836/13152 >>
10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0
cs=0 l=1
c=0x147ac780).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.438792 7f871194f700  0 --
10.141.8.180:6836/13152 >>
10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0
l=1
c=0x1fd44360).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.439028 7f870e319700  0 --
10.141.8.180:6836/13152 >>
10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0
cs=0 l=1
c=0x1fd43de0).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In
function
'void
CDir::_omap_fetched(ceph::bufferlist&,
std::map<std::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
ceph::buffer::list,
std::less<std::basic_string<char, std::char_traits<char>,
std::allocator<char> > >, std::allocator<std::pair<const
std::basic_string<char, std::char_traits<char>,
std::allocator<char> >,
ceph::buffer::list> > >&, const std::string&, int)' thread
7f871b7ae700
time
2014-04-01 15:46:34.448320
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)


could you use gdb to check what is value of variable 'r' .



If you look at the crash dump log you can see the return value in
the
osd_op_reply message:
-1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ====
osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header
0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
0x2cff7aa80 con 0x37ea3c0

-108, which is ESHUTDOWN, but we also use it (via the 108
constant, I
think because ESHUTDOWN varies across platforms) as EBLACKLISTED.
So it looks like this is itself actually a symptom of another
problem
that is causing the MDS to get timed out on the monitor. If a core
dump is "eating the local space", maybe the MDS is stuck in an
infinite allocation loop of some kind? How big are your disks,
Kenneth? Do you have any information on how much CPU/memory the MDS
was using before this?



I monitored the mds process after restart:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19215 root      20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54
ceph-mds
19215 root      20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39
ceph-mds
19215 root      20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47
ceph-mds
19215 root      20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01
ceph-mds
19215 root      20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83
ceph-mds
19215 root      20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66
ceph-mds
19215 root      20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74 ceph-mds
19215 root      20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07 ceph-mds
19215 root      20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16 ceph-mds
19215 root      20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46
ceph-mds

Then it crashes. I changed the core dump location out of the root fs,
the
core dump is indeed about 26G

My disks:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       9.9G  2.9G  6.5G  31% /
tmpfs            16G     0   16G   0% /dev/shm
/dev/sda1       248M   53M  183M  23% /boot
/dev/sda4       172G   61G  112G  35% /var/lib/ceph/log/sda4
/dev/sdb        187G   61G  127G  33% /var/lib/ceph/log/sdb
/dev/sdc        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
/dev/sdd        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
/dev/sde        3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
/dev/sdf        3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
/dev/sdg        3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
/dev/sdh        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
/dev/sdi        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
/dev/sdj        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
/dev/sdk        3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
/dev/sdl        3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
/dev/sdm        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
/dev/sdn        3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn




-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com




----- End message from Gregory Farnum <g...@inktank.com> -----



--

Met vriendelijke groeten,
Kenneth Waegeman

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Attachment: pprof53690.0.ps.gz
Description: application/gzip

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to