Hello, just to report, Looks like change the message type to simple help to avoid the memory leak. Just about a day later the memory still OK: 1264 ceph 20 0 12,547g 1,247g 16652 S 3,3 8,2 110:16.93 ceph-mds
The memory usage is more than 2x of MDS limit (512Mb), but maybe is the daemon overhead and the memory fragmentation. At least is not 13-15Gb like before. Greetings!! 2018-07-25 23:16 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: > I've changed the configuration adding your line and changing the mds > memory limit to 512Mb, and for now looks stable (its on about 3-6% and > sometimes even below 3%). I've got a very high usage on boot: > 1264 ceph 20 0 12,543g 6,251g 16184 S 2,0 41,1% 0:19.34 > ceph-mds > > but now looks acceptable: > 1264 ceph 20 0 12,543g 737952 16188 S 1,0 4,6% 0:41.05 > ceph-mds > > Anyway, I need time to test it, because 15 minutes is too less. > > Greetings!! > > 2018-07-25 17:16 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: > >> Hello, >> >> Thanks for all your help. >> >> The dd is an option of any command?, because at least on Debian/Ubuntu is >> an aplication to copy blocks, and then fails. >> For now I cannot change the configuration, but later I'll try. >> About the logs, I've not seen nothing about "warning", "error", "failed", >> "message" or something similar, so looks like there are no messages of that >> kind. >> >> >> Greetings!! >> >> 2018-07-25 14:48 GMT+02:00 Yan, Zheng <uker...@gmail.com>: >> >>> On Wed, Jul 25, 2018 at 8:12 PM Yan, Zheng <uker...@gmail.com> wrote: >>> > >>> > On Wed, Jul 25, 2018 at 5:04 PM Daniel Carrasco <d.carra...@i2tic.com> >>> wrote: >>> > > >>> > > Hello, >>> > > >>> > > I've attached the PDF. >>> > > >>> > > I don't know if is important, but I made changes on configuration >>> and I've restarted the servers after dump that heap file. I've changed the >>> memory_limit to 25Mb to test if stil with aceptable values of RAM. >>> > > >>> > >>> > Looks like there are memory leak in async messenger. what's output of >>> > "dd /usr/bin/ceph-mds"? Could you try simple messenger (add "ms type = >>> > simple" to 'global' section of ceph.conf) >>> > >>> >>> Besides, are there any suspicious messages in mds log? such as "failed >>> to decode message of type" >>> >>> >>> >>> >>> > Regards >>> > Yan, Zheng >>> > >>> > > Greetings! >>> > > >>> > > 2018-07-25 2:53 GMT+02:00 Yan, Zheng <uker...@gmail.com>: >>> > >> >>> > >> On Wed, Jul 25, 2018 at 4:52 AM Daniel Carrasco < >>> d.carra...@i2tic.com> wrote: >>> > >> > >>> > >> > Hello, >>> > >> > >>> > >> > I've run the profiler for about 5-6 minutes and this is what I've >>> got: >>> > >> > >>> > >> >>> > >> please run pprof --pdf /usr/bin/ceph-mds >>> > >> /var/log/ceph/ceph-mds.x.profile.<largest number>.heap > >>> > >> /tmp/profile.pdf. and send me the pdf >>> > >> >>> > >> >>> > >> >>> > >> > ------------------------------------------------------------ >>> -------------------------------- >>> > >> > ------------------------------------------------------------ >>> -------------------------------- >>> > >> > ------------------------------------------------------------ >>> -------------------------------- >>> > >> > Using local file /usr/bin/ceph-mds. >>> > >> > Using local file /var/log/ceph/mds.kavehome-mgt >>> o-pro-fs01.profile.0009.heap. >>> > >> > Total: 400.0 MB >>> > >> > 362.5 90.6% 90.6% 362.5 90.6% >>> ceph::buffer::create_aligned_in_mempool >>> > >> > 20.4 5.1% 95.7% 29.8 7.5% CDir::_load_dentry >>> > >> > 5.9 1.5% 97.2% 6.9 1.7% CDir::add_primary_dentry >>> > >> > 4.7 1.2% 98.4% 4.7 1.2% >>> ceph::logging::Log::create_entry >>> > >> > 1.8 0.5% 98.8% 1.8 0.5% >>> std::_Rb_tree::_M_emplace_hint_unique >>> > >> > 1.8 0.5% 99.3% 2.2 0.5% compact_map_base::decode >>> > >> > 0.6 0.1% 99.4% 0.7 0.2% CInode::add_client_cap >>> > >> > 0.5 0.1% 99.5% 0.5 0.1% >>> std::__cxx11::basic_string::_M_mutate >>> > >> > 0.4 0.1% 99.6% 0.4 0.1% SimpleLock::more >>> > >> > 0.4 0.1% 99.7% 0.4 0.1% MDCache::add_inode >>> > >> > 0.3 0.1% 99.8% 0.3 0.1% CDir::add_to_bloom >>> > >> > 0.2 0.1% 99.9% 0.2 0.1% CDir::steal_dentry >>> > >> > 0.2 0.0% 99.9% 0.2 0.0% CInode::get_or_open_dirfrag >>> > >> > 0.1 0.0% 99.9% 0.8 0.2% std::enable_if::type decode >>> > >> > 0.1 0.0% 100.0% 0.1 0.0% ceph::buffer::list::crc32c >>> > >> > 0.1 0.0% 100.0% 0.1 0.0% decode_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% OpTracker::create_request >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% TrackedOp::TrackedOp >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::vector::_M_emplace_back_aux >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::_Rb_tree::_M_insert_unique >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::add_dirfrag >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::_prepare_new_segment >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% DispatchQueue::enqueue >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> ceph::buffer::list::push_back >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Server::prepare_new_inode >>> > >> > 0.0 0.0% 100.0% 365.6 91.4% EventCenter::process_events >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::_Rb_tree::_M_copy >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::add_null_dentry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::check_inode_max_size >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::add_client_lease >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::project_inode >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::__cxx11::list::_M_insert >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDBalancer::handle_heartbeat >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::send_heartbeat >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> C_GatherBase::C_GatherSub::complete >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> EventCenter::create_time_event >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::_omap_fetch >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::handle_inode_file_caps >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::_Rb_tree::_M_insert_equal >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::issue_caps >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% MDLog::_submit_thread >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_wait_for_flush >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::wrap_finisher >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDSCacheObject::add_waiter >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::__cxx11::list::insert >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::__detail::_Map_base::operator[] >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::mark_updated_scatterlock >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::_Rb_tree::_M_insert_ >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% alloc_ptr::operator-> >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> ceph::buffer::list::append@5c1560 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> ceph::buffer::malformed_input::~malformed_input >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% compact_set_base::insert >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::add_waiter >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% InoTable::apply_release_ids >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> InoTable::project_release_ids >>> > >> > 0.0 0.0% 100.0% 2.2 0.5% InodeStoreBase::decode_bare >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% interval_set::erase >>> > >> > 0.0 0.0% 100.0% 1.1 0.3% std::map::operator[] >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Beacon::_send >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDSDaemon::reset_tick >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MgrClient::send_report >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_do_flush >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% Locker::rdlock_start >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::_get_waiter >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::~CDentry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::schedule_tick >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% >>> AsyncConnection::handle_write >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% >>> AsyncConnection::prepare_send_message >>> > >> > 0.0 0.0% 100.0% 365.5 91.4% AsyncConnection::process >>> > >> > 0.0 0.0% 100.0% 0.3 0.1% >>> AsyncConnection::send_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% AsyncConnection::tick >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> AsyncMessenger::_send_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> AsyncMessenger::send_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> AsyncMessenger::submit_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Beacon::notify_health >>> > >> > 0.0 0.0% 100.0% 0.2 0.1% CDentry::CDentry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::_mark_dirty >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::auth_pin >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::mark_dirty >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> CDentry::pop_projected_linkage >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::_mark_dirty >>> > >> > 0.0 0.0% 100.0% 29.8 7.5% CDir::_omap_fetched >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::auth_pin >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::fetch >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::link_inode_work >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::link_primary_inode >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::_mark_dirty >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::add_waiter >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::auth_pin >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% CInode::choose_ideal_loner >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% CInode::encode_inodestat >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::mark_dirty >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::mark_dirty_parent >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::mark_dirty_rstat >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> CInode::pop_and_dirty_projected_inode >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% CInode::set_loner_cap >>> > >> > 0.0 0.0% 100.0% 29.8 7.5% >>> C_IO_Dir_OMAP_Fetched::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> C_Locker_FileUpdate_finish::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDL_CheckMaxSize::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> C_MDS_RetryRequest::~C_MDS_RetryRequest >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> C_MDS_inode_update_finish::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDS_openc_finish::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> C_MDS_session_finish::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% C_OnFinisher::finish >>> > >> > 0.0 0.0% 100.0% 1.2 0.3% Context::complete >>> > >> > 0.0 0.0% 100.0% 3.7 0.9% >>> DispatchQueue::DispatchThread::entry >>> > >> > 0.0 0.0% 100.0% 3.7 0.9% DispatchQueue::entry >>> > >> > 0.0 0.0% 100.0% 0.9 0.2% >>> DispatchQueue::fast_dispatch >>> > >> > 0.0 0.0% 100.0% 2.1 0.5% DispatchQueue::pre_dispatch >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% EMetaBlob::print >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> EventCenter::process_time_events >>> > >> > 0.0 0.0% 100.0% 29.9 7.5% >>> Finisher::finisher_thread_entry >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% FunctionContext::finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_flush >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_write_head >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::flush >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::wait_for_flush >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_do_cap_release >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_do_cap_update >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_drop_non_rdlocks >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_rdlock_kick >>> > >> > 0.0 0.0% 100.0% 0.2 0.0% Locker::acquire_locks >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::adjust_cap_wanted >>> > >> > 0.0 0.0% 100.0% 0.2 0.0% Locker::dispatch >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::drop_locks >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% Locker::eval >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::eval_gather >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::file_update_finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::handle_client_cap_release >>> > >> > 0.0 0.0% 100.0% 0.2 0.0% Locker::handle_client_caps >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_client_lease >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_file_lock >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_lock >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::issue_caps_set >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::issue_client_lease >>> > >> > 0.0 0.0% 100.0% 0.6 0.2% Locker::issue_new_caps >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::local_wrlock_start >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::nudge_log >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_eval >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_mix >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_nudge >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_tick >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_writebehind >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::scatter_writebehind_finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::send_lock_message@42d5b0 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::send_lock_message@42f2b0 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Locker::share_inode_max_size >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::simple_lock >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::simple_sync >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::tick >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::try_eval@43da60 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::try_eval@441fb0 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::wrlock_finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::wrlock_force >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::wrlock_start >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::xlock_start >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MClientCaps::print >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MClientRequest::decode_payload >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MClientRequest::print >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::prep_rebalance >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::proc_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::check_memory_usage >>> > >> > 0.0 0.0% 100.0% 0.2 0.1% MDCache::path_traverse >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDCache::predirty_journal_parents >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::request_cleanup >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::request_finish >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::request_start >>> > >> > 0.0 0.0% 100.0% 0.3 0.1% MDCache::trim >>> > >> > 0.0 0.0% 100.0% 0.3 0.1% MDCache::trim_dentry >>> > >> > 0.0 0.0% 100.0% 0.3 0.1% MDCache::trim_lru >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::truncate_inode >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% MDLog::SubmitThread::entry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::_start_new_segment >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::_submit_entry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::submit_entry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDSCacheObject::finish_waiting >>> > >> > 0.0 0.0% 100.0% 0.2 0.1% MDSCacheObject::get >>> > >> > 0.0 0.0% 100.0% 1.7 0.4% MDSDaemon::ms_dispatch >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDSDaemon::tick >>> > >> > 0.0 0.0% 100.0% 29.9 7.5% MDSIOContextBase::complete >>> > >> > 0.0 0.0% 100.0% 0.7 0.2% >>> MDSInternalContextBase::complete >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDSLogContextBase::complete >>> > >> > 0.0 0.0% 100.0% 0.3 0.1% >>> MDSRank::ProgressThread::entry >>> > >> > 0.0 0.0% 100.0% 0.7 0.2% MDSRank::_advance_queues >>> > >> > 0.0 0.0% 100.0% 1.7 0.4% MDSRank::_dispatch >>> > >> > 0.0 0.0% 100.0% 1.3 0.3% >>> MDSRank::handle_deferrable_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDSRank::send_message_client >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDSRank::send_message_client_counted@2a9260 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDSRank::send_message_client_counted@2a94f0 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MDSRank::send_message_client_counted@2b1920 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MDSRank::send_message_mds >>> > >> > 0.0 0.0% 100.0% 1.7 0.4% >>> MDSRankDispatcher::ms_dispatch >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% MDSRankDispatcher::tick >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MOSDOp::print >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% Message::encode >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MonClient::_check_auth_rotating >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MonClient::_check_auth_tickets >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> MonClient::_send_mon_message >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::tick >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MutationImpl::MutationImpl >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% MutationImpl::auth_pin >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% MutationImpl::pin >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% MutationImpl::start_locking >>> > >> > 0.0 0.0% 100.0% 365.6 91.4% NetworkStack::get_worker >>> > >> > 0.0 0.0% 100.0% 0.8 0.2% >>> ObjectOperation::C_ObjectOperation_decodevals::finish >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::_op_submit >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% >>> Objecter::_op_submit_with_budget >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::_send_op >>> > >> > 0.0 0.0% 100.0% 0.8 0.2% >>> Objecter::handle_osd_op_reply >>> > >> > 0.0 0.0% 100.0% 0.8 0.2% Objecter::ms_dispatch >>> > >> > 0.0 0.0% 100.0% 0.8 0.2% Objecter::ms_fast_dispatch >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::op_submit >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Objecter::sg_write_trunc >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% OpHistory::insert >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> OpTracker::unregister_inflight_op >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> PrebufferedStreambuf::overflow >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% SafeTimer::add_event_after >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% SafeTimer::add_event_at >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% SafeTimer::timer_thread >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% SafeTimerThread::entry >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Server::_session_logged >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Server::apply_allocated_inos >>> > >> > 0.0 0.0% 100.0% 1.1 0.3% Server::dispatch >>> > >> > 0.0 0.0% 100.0% 1.7 0.4% >>> Server::dispatch_client_request >>> > >> > 0.0 0.0% 100.0% 0.7 0.2% >>> Server::handle_client_getattr >>> > >> > 0.0 0.0% 100.0% 0.9 0.2% Server::handle_client_open >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Server::handle_client_openc >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Server::handle_client_readdir >>> > >> > 0.0 0.0% 100.0% 1.1 0.3% >>> Server::handle_client_request >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Server::handle_client_session >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Server::handle_client_setattr >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Server::journal_and_reply >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> Server::journal_close_session >>> > >> > 0.0 0.0% 100.0% 0.3 0.1% Server::rdlock_path_pin_ref >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% Server::recall_client_state >>> > >> > 0.0 0.0% 100.0% 0.5 0.1% >>> Server::reply_client_request >>> > >> > 0.0 0.0% 100.0% 0.5 0.1% Server::respond_to_request >>> > >> > 0.0 0.0% 100.0% 0.4 0.1% Server::set_trace_dist >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% SessionMap::_mark_dirty >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% SessionMap::mark_dirty >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% SessionMap::remove_session >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> ceph::buffer::list::iterator_impl::copy >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> ceph::buffer::list::iterator_impl::copy_shallow >>> > >> > 0.0 0.0% 100.0% 400.0 100.0% clone >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% filepath::parse_bits >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% inode_t::operator= >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% operator<<@2a2890 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% operator<<@2c9760 >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% operator<<@3eadf0 >>> > >> > 0.0 0.0% 100.0% 400.0 100.0% start_thread >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% >>> std::__cxx11::basic_string::_M_append >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::__cxx11::basic_string::_M_replace_aux >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::__cxx11::list::operator= >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::__ostream_insert >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% >>> std::basic_streambuf::xsputn >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::num_put::_M_insert_int >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::num_put::do_put >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::operator<< >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% std::ostream::_M_insert >>> > >> > 0.0 0.0% 100.0% 365.6 91.4% >>> std::this_thread::__sleep_for >>> > >> > 0.0 0.0% 100.0% 0.0 0.0% utime_t::localtime >>> > >> > 0.0 0.0% 100.0% 0.1 0.0% void finish_contexts@2a30f0 >>> > >> > ------------------------------------------------------------ >>> -------------------------------- >>> > >> > ------------------------------------------------------------ >>> -------------------------------- >>> > >> > ------------------------------------------------------------ >>> -------------------------------- >>> > >> > >>> > >> > >>> > >> > Greetings!! >>> > >> > >>> > >> > 2018-07-24 12:07 GMT+02:00 Yan, Zheng <uker...@gmail.com>: >>> > >> >> >>> > >> >> On Tue, Jul 24, 2018 at 4:59 PM Daniel Carrasco < >>> d.carra...@i2tic.com> wrote: >>> > >> >> > >>> > >> >> > Hello, >>> > >> >> > >>> > >> >> > How many time is neccesary?, because is a production >>> environment and memory profiler + low cache size because the problem, gives >>> a lot of CPU usage from OSD and MDS that makes it fails while profiler is >>> running. Is there any problem if is done in a low traffic time? (less usage >>> and maybe it don't fails, but maybe less info about usage). >>> > >> >> > >>> > >> >> >>> > >> >> just one time, wait a few minutes between start_profiler and >>> stop_profiler >>> > >> >> >>> > >> >> > Greetings! >>> > >> >> > >>> > >> >> > 2018-07-24 10:21 GMT+02:00 Yan, Zheng <uker...@gmail.com>: >>> > >> >> >> >>> > >> >> >> I mean: >>> > >> >> >> >>> > >> >> >> ceph tell mds.x heap start_profiler >>> > >> >> >> >>> > >> >> >> ... wait for some time >>> > >> >> >> >>> > >> >> >> ceph tell mds.x heap stop_profiler >>> > >> >> >> >>> > >> >> >> pprof --text /usr/bin/ceph-mds >>> > >> >> >> /var/log/ceph/ceph-mds.x.profile.<largest number>.heap >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco < >>> d.carra...@i2tic.com> wrote: >>> > >> >> >> > >>> > >> >> >> > This is what i get: >>> > >> >> >> > >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump >>> > >> >> >> > 2018-07-24 09:05:19.350720 7fc562ffd700 0 client.1452545 >>> ms_handle_reset on 10.22.0.168:6800/1685786126 >>> > >> >> >> > 2018-07-24 09:05:29.103903 7fc563fff700 0 client.1452548 >>> ms_handle_reset on 10.22.0.168:6800/1685786126 >>> > >> >> >> > mds.kavehome-mgto-pro-fs01 dumping heap profile now. >>> > >> >> >> > ------------------------------------------------ >>> > >> >> >> > MALLOC: 760199640 ( 725.0 MiB) Bytes in use by >>> application >>> > >> >> >> > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap >>> freelist >>> > >> >> >> > MALLOC: + 246962320 ( 235.5 MiB) Bytes in central cache >>> freelist >>> > >> >> >> > MALLOC: + 43933664 ( 41.9 MiB) Bytes in transfer >>> cache freelist >>> > >> >> >> > MALLOC: + 41012664 ( 39.1 MiB) Bytes in thread cache >>> freelists >>> > >> >> >> > MALLOC: + 10186912 ( 9.7 MiB) Bytes in malloc >>> metadata >>> > >> >> >> > MALLOC: ------------ >>> > >> >> >> > MALLOC: = 1102295200 ( 1051.2 MiB) Actual memory used >>> (physical + swap) >>> > >> >> >> > MALLOC: + 4268335104 ( 4070.6 MiB) Bytes released to OS >>> (aka unmapped) >>> > >> >> >> > MALLOC: ------------ >>> > >> >> >> > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space >>> used >>> > >> >> >> > MALLOC: >>> > >> >> >> > MALLOC: 33027 Spans in use >>> > >> >> >> > MALLOC: 19 Thread heaps in use >>> > >> >> >> > MALLOC: 8192 Tcmalloc page size >>> > >> >> >> > ------------------------------------------------ >>> > >> >> >> > Call ReleaseFreeMemory() to release freelist memory to the >>> OS (via madvise()). >>> > >> >> >> > Bytes released to the OS take up virtual address space but >>> no physical memory. >>> > >> >> >> > >>> > >> >> >> > >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats >>> > >> >> >> > 2018-07-24 09:14:25.747706 7f94fffff700 0 client.1452578 >>> ms_handle_reset on 10.22.0.168:6800/1685786126 >>> > >> >> >> > 2018-07-24 09:14:25.754034 7f95057fa700 0 client.1452581 >>> ms_handle_reset on 10.22.0.168:6800/1685786126 >>> > >> >> >> > mds.kavehome-mgto-pro-fs01 tcmalloc heap >>> stats:------------------------------------------------ >>> > >> >> >> > MALLOC: 960649328 ( 916.1 MiB) Bytes in use by >>> application >>> > >> >> >> > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap >>> freelist >>> > >> >> >> > MALLOC: + 108867288 ( 103.8 MiB) Bytes in central cache >>> freelist >>> > >> >> >> > MALLOC: + 37179424 ( 35.5 MiB) Bytes in transfer >>> cache freelist >>> > >> >> >> > MALLOC: + 40143000 ( 38.3 MiB) Bytes in thread cache >>> freelists >>> > >> >> >> > MALLOC: + 10186912 ( 9.7 MiB) Bytes in malloc >>> metadata >>> > >> >> >> > MALLOC: ------------ >>> > >> >> >> > MALLOC: = 1157025952 ( 1103.4 MiB) Actual memory used >>> (physical + swap) >>> > >> >> >> > MALLOC: + 4213604352 ( 4018.4 MiB) Bytes released to OS >>> (aka unmapped) >>> > >> >> >> > MALLOC: ------------ >>> > >> >> >> > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space >>> used >>> > >> >> >> > MALLOC: >>> > >> >> >> > MALLOC: 33028 Spans in use >>> > >> >> >> > MALLOC: 19 Thread heaps in use >>> > >> >> >> > MALLOC: 8192 Tcmalloc page size >>> > >> >> >> > ------------------------------------------------ >>> > >> >> >> > Call ReleaseFreeMemory() to release freelist memory to the >>> OS (via madvise()). >>> > >> >> >> > Bytes released to the OS take up virtual address space but >>> no physical memory. >>> > >> >> >> > >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > -------------------------------------------------------- >>> > >> >> >> > After heap release: >>> > >> >> >> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats >>> > >> >> >> > 2018-07-24 09:15:28.540203 7f2f7affd700 0 client.1443339 >>> ms_handle_reset on 10.22.0.168:6800/1685786126 >>> > >> >> >> > 2018-07-24 09:15:28.547153 7f2f7bfff700 0 client.1443342 >>> ms_handle_reset on 10.22.0.168:6800/1685786126 >>> > >> >> >> > mds.kavehome-mgto-pro-fs01 tcmalloc heap >>> stats:------------------------------------------------ >>> > >> >> >> > MALLOC: 710315776 ( 677.4 MiB) Bytes in use by >>> application >>> > >> >> >> > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap >>> freelist >>> > >> >> >> > MALLOC: + 246471880 ( 235.1 MiB) Bytes in central cache >>> freelist >>> > >> >> >> > MALLOC: + 40802848 ( 38.9 MiB) Bytes in transfer >>> cache freelist >>> > >> >> >> > MALLOC: + 38689304 ( 36.9 MiB) Bytes in thread cache >>> freelists >>> > >> >> >> > MALLOC: + 10186912 ( 9.7 MiB) Bytes in malloc >>> metadata >>> > >> >> >> > MALLOC: ------------ >>> > >> >> >> > MALLOC: = 1046466720 ( 998.0 MiB) Actual memory used >>> (physical + swap) >>> > >> >> >> > MALLOC: + 4324163584 ( 4123.8 MiB) Bytes released to OS >>> (aka unmapped) >>> > >> >> >> > MALLOC: ------------ >>> > >> >> >> > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space >>> used >>> > >> >> >> > MALLOC: >>> > >> >> >> > MALLOC: 33177 Spans in use >>> > >> >> >> > MALLOC: 19 Thread heaps in use >>> > >> >> >> > MALLOC: 8192 Tcmalloc page size >>> > >> >> >> > ------------------------------------------------ >>> > >> >> >> > Call ReleaseFreeMemory() to release freelist memory to the >>> OS (via madvise()). >>> > >> >> >> > Bytes released to the OS take up virtual address space but >>> no physical memory. >>> > >> >> >> > >>> > >> >> >> > >>> > >> >> >> > The other commands fails with a curl error: >>> > >> >> >> > Failed to get profile: curl 'http:///pprof/profile?seconds=30' >>> > /root/pprof/.tmp.ceph-mds.1532416424.: >>> > >> >> >> > >>> > >> >> >> > >>> > >> >> >> > Greetings!! >>> > >> >> >> > >>> > >> >> >> > 2018-07-24 5:35 GMT+02:00 Yan, Zheng <uker...@gmail.com>: >>> > >> >> >> >> >>> > >> >> >> >> could you profile memory allocation of mds >>> > >> >> >> >> >>> > >> >> >> >> http://docs.ceph.com/docs/mimi >>> c/rados/troubleshooting/memory-profiling/ >>> > >> >> >> >> On Tue, Jul 24, 2018 at 7:54 AM Daniel Carrasco < >>> d.carra...@i2tic.com> wrote: >>> > >> >> >> >> > >>> > >> >> >> >> > Yeah, is also my thread. This thread was created before >>> lower the cache size from 512Mb to 8Mb. I thought that maybe was my fault >>> and I did a misconfiguration, so I've ignored the problem until now. >>> > >> >> >> >> > >>> > >> >> >> >> > Greetings! >>> > >> >> >> >> > >>> > >> >> >> >> > El mar., 24 jul. 2018 1:00, Gregory Farnum < >>> gfar...@redhat.com> escribió: >>> > >> >> >> >> >> >>> > >> >> >> >> >> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly < >>> pdonn...@redhat.com> wrote: >>> > >> >> >> >> >>> >>> > >> >> >> >> >>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco < >>> d.carra...@i2tic.com> wrote: >>> > >> >> >> >> >>> > Hi, thanks for your response. >>> > >> >> >> >> >>> > >>> > >> >> >> >> >>> > Clients are about 6, and 4 of them are the most of >>> time on standby. Only two >>> > >> >> >> >> >>> > are active servers that are serving the webpage. >>> Also we've a varnish on >>> > >> >> >> >> >>> > front, so are not getting all the load (below 30% in >>> PHP is not much). >>> > >> >> >> >> >>> > About the MDS cache, now I've the >>> mds_cache_memory_limit at 8Mb. >>> > >> >> >> >> >>> >>> > >> >> >> >> >>> What! Please post `ceph daemon mds.<name> config >>> diff`, `... perf >>> > >> >> >> >> >>> dump`, and `... dump_mempools ` from the server the >>> active MDS is on. >>> > >> >> >> >> >>> >>> > >> >> >> >> >>> > I've tested >>> > >> >> >> >> >>> > also 512Mb, but the CPU usage is the same and the >>> MDS RAM usage grows up to >>> > >> >> >> >> >>> > 15GB (on a 16Gb server it starts to swap and all >>> fails). With 8Mb, at least >>> > >> >> >> >> >>> > the memory usage is stable on less than 6Gb (now is >>> using about 1GB of RAM). >>> > >> >> >> >> >>> >>> > >> >> >> >> >>> We've seen reports of possible memory leaks before and >>> the potential >>> > >> >> >> >> >>> fixes for those were in 12.2.6. How fast does your MDS >>> reach 15GB? >>> > >> >> >> >> >>> Your MDS cache size should be configured to 1-8GB >>> (depending on your >>> > >> >> >> >> >>> preference) so it's disturbing to see you set it so >>> low. >>> > >> >> >> >> >> >>> > >> >> >> >> >> >>> > >> >> >> >> >> See also the thread "[ceph-users] Fwd: MDS memory usage >>> is very high", which had more discussion of that. The MDS daemon seemingly >>> had 9.5GB of allocated RSS but only believed 489MB was in use for the >>> cache... >>> > >> >> >> >> >> -Greg >>> > >> >> >> >> >> >>> > >> >> >> >> >>> >>> > >> >> >> >> >>> >>> > >> >> >> >> >>> -- >>> > >> >> >> >> >>> Patrick Donnelly >>> > >> >> >> >> >>> _______________________________________________ >>> > >> >> >> >> >>> ceph-users mailing list >>> > >> >> >> >> >>> ceph-users@lists.ceph.com >>> > >> >> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >> >> >> >> > >>> > >> >> >> >> > _______________________________________________ >>> > >> >> >> >> > ceph-users mailing list >>> > >> >> >> >> > ceph-users@lists.ceph.com >>> > >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >> >> >> > >>> > >> >> >> > >>> > >> >> >> > >>> > >> >> >> > >>> > >> >> >> > -- >>> > >> >> >> > _________________________________________ >>> > >> >> >> > >>> > >> >> >> > Daniel Carrasco Marín >>> > >> >> >> > Ingeniería para la Innovación i2TIC, S.L. >>> > >> >> >> > Tlf: +34 911 12 32 84 Ext: 223 >>> > >> >> >> > www.i2tic.com >>> > >> >> >> > _________________________________________ >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > -- >>> > >> >> > _________________________________________ >>> > >> >> > >>> > >> >> > Daniel Carrasco Marín >>> > >> >> > Ingeniería para la Innovación i2TIC, S.L. >>> > >> >> > Tlf: +34 911 12 32 84 Ext: 223 >>> > >> >> > www.i2tic.com >>> > >> >> > _________________________________________ >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > -- >>> > >> > _________________________________________ >>> > >> > >>> > >> > Daniel Carrasco Marín >>> > >> > Ingeniería para la Innovación i2TIC, S.L. >>> > >> > Tlf: +34 911 12 32 84 Ext: 223 >>> > >> > www.i2tic.com >>> > >> > _________________________________________ >>> > > >>> > > >>> > > >>> > > >>> > > -- >>> > > _________________________________________ >>> > > >>> > > Daniel Carrasco Marín >>> > > Ingeniería para la Innovación i2TIC, S.L. >>> > > Tlf: +34 911 12 32 84 Ext: 223 >>> > > www.i2tic.com >>> > > _________________________________________ >>> >> >> >> >> -- >> _________________________________________ >> >> Daniel Carrasco Marín >> Ingeniería para la Innovación i2TIC, S.L. >> Tlf: +34 911 12 32 84 Ext: 223 >> www.i2tic.com >> _________________________________________ >> > > > > -- > _________________________________________ > > Daniel Carrasco Marín > Ingeniería para la Innovación i2TIC, S.L. > Tlf: +34 911 12 32 84 Ext: 223 > www.i2tic.com > _________________________________________ > -- _________________________________________ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _________________________________________
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com