当发生宕机时,你可以在be.out里找到这样的堆栈,这种是有效的信息
PC: @ 0x25fd94b tcmalloc::CentralFreeList::FetchFromOneSpans() *** SIGSEGV (@0x0) received by PID 71566 (TID 0x7f46785f8700) from PID 0; stack trace: *** @ 0x7f46e6dea5d0 (unknown) @ 0x25fd94b tcmalloc::CentralFreeList::FetchFromOneSpans() @ 0x25fdc1c tcmalloc::CentralFreeList::FetchFromOneSpansSafe() @ 0x25fdd17 tcmalloc::CentralFreeList::RemoveRange() @ 0x260a1e3 tcmalloc::ThreadCache::FetchFromCentralCache() @ 0x23c4818 google::protobuf::internal::RepeatedPtrFieldBase::InternalExtend() @ 0xdaac98 doris::ColumnWriter::finalize() @ 0xdb5038 doris::DoubleColumnWriterBase<>::finalize() @ 0xd91adb doris::SegmentWriter::_make_file_header() @ 0xd9254b doris::SegmentWriter::finalize() @ 0xd67e76 doris::ColumnDataWriter::_finalize_segment() @ 0xd694de doris::ColumnDataWriter::finalize() @ 0xd2db3c doris::SchemaChangeDirectly::process() @ 0xd3043c doris::SchemaChangeHandler::_alter_table() @ 0xd340f9 doris::SchemaChangeHandler::_do_alter_table() @ 0xd35283 doris::SchemaChangeHandler::process_alter_table() @ 0xcaa28b doris::OLAPEngine::schema_change() @ 0x11bf8be doris::TaskWorkerPool::_alter_table() @ 0x11c9a55 doris::TaskWorkerPool::_alter_table_worker_thread_callback() @ 0x7f46e6de2dd5 start_thread @ 0x7f46e61e802d __clone 又或者你可以使用gdb得到更详细的代码堆栈 从你目前给出的堆栈中,我没有看到有效的信息 刘波 <270309...@qq.com.invalid> 于2022年3月9日周三 14:59写道: > 尊敬的开发者,您好: > 今天be再次异常,其中be.out信息如下 > > > gdb结果如下 > > > > ------------------ 原始邮件 ------------------ > *发件人:* "dev" <wangbo13...@gmail.com>; > *发送时间:* 2022年3月4日(星期五) 中午1:31 > *收件人:* "dev"<dev@doris.apache.org>; > *主题:* Re: be异常退出 > > 目前的堆栈看起来不足以支撑做出判断; > 如果线上有开core dump的话,可以用gdb palo_be core_dump文件看看堆栈 > 后者看下be.out是否进程怪盗时的堆栈 > > 刘波 <270309...@qq.com.invalid> 于2022年3月4日周五 12:25写道: > > > 尊敬的开发者,您好: > > 我们的doris > > be节点经常会挂掉其中几个,经分析资源情况正常,具体信息可参见附件,日志层面未发现是OOM,未能定位出异常,请求协助,具体信息如下: > > *时间: 2022-03-04 10:54:43* > > doris报错*:detailMessage = tablet 45664199 has few replicas: 1, > > alive backends: [10004]* > > 环境信息:华为云cetnos 8 64位,16核64G > > > > 系统日志: less /var/log/message(*非OOM*) > > > > Mar 4 10:54:38 narwal-doris-be-0004 systemd[1]: Started Process Core > Dump > > (PID 1881819/UID 0). > > > > Mar 4 10:54:41 narwal-doris-be-0004 systemd-coredump[1881820]: Core file > > was truncated to 2147483648 bytes. > > > > Mar 4 10:55:00 narwal-doris-be-0004 systemd-coredump[1881820]: Process > > 2800630 (palo_be) of user 0 dumped core.#012#012Stack trace of thread > > 1877601:#012#0 0x00000000039480a2 memcpy > > > > (/mnt/be/lib/palo_be)#012#012Stack trace of thread 2800630:#012#0 > > 0x00007fccb330efc8 n/a (n/a) > > > > Mar 4 10:55:01 narwal-doris-be-0004 systemd[1]: > > systemd-coredump@1-1881819-0.service: Succeeded. > > > > Mar 4 10:55:03 narwal-doris-be-0004 systemd[1]: session-12.scope: > > Succeeded. > > > > > > dump信息:*coredumpctl info 2800630* > > > > PID: 2800630 (palo_be) > > UID: 0 (root) > > GID: 0 (root) > > Signal: 11 (SEGV) > > Timestamp: Fri 2022-03-04 10:54:38 CST (1h 10min ago) > > Command Line: /mnt/be/lib/palo_be > > Executable: /mnt/be/lib/palo_be > > Control Group: / > > Slice: -.slice > > Boot ID: ca3ef395d7c547d2aecb1c251097066f > > Machine ID: 501f93b5c19d4ca38db845c29176e3c5 > > Hostname: narwal-doris-be-0004 > > Storage: > > > /var/lib/systemd/coredump/core.palo_be.0.ca3ef395d7c547d2aecb1c251097066f.2800630.1646362478000000.lz4 > > (truncated) > > Message: Process 2800630 (palo_be) of user 0 dumped core. > > > > Stack trace of thread 1877601: > > #0 0x00000000039480a2 memcpy (/mnt/be/lib/palo_be) > > > > Stack trace of thread 2800630: > > #0 0x00007fccb330efc8 n/a (n/a) > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org > > For additional commands, e-mail: dev-h...@doris.apache.org > > > > -- > 王博 Wang Bo > > -- 王博 Wang Bo