acelyc111 opened a new issue #4996:
URL: https://github.com/apache/incubator-doris/issues/4996


   **Describe the bug**
   I found a coredump, back trace look like:
   ```
   Program terminated with signal 6, Aborted.
   #0  0x00007fca7abcb1d7 in raise () from /lib64/libc.so.6
   Missing separate debuginfos, use: debuginfo-install 
glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 
zlib-1.2.7-17.el7.x86_64
   (gdb) bt
   #0  0x00007fca7abcb1d7 in raise () from /lib64/libc.so.6
   #1  0x00007fca7abcc8c8 in abort () from /lib64/libc.so.6
   #2  0x0000000001b13376 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
   #3  0x0000000001b0a67d in google::LogMessage::Fail () at src/logging.cc:1599
   #4  0x0000000001b0c504 in google::LogMessage::SendToLog 
(this=0x7fca74df2770) at src/logging.cc:1553
   #5  0x0000000001b0a1a4 in google::LogMessage::Flush (this=0x7fca74df2770) at 
src/logging.cc:1422
   #6  0x0000000001b0cf39 in google::LogMessageFatal::~LogMessageFatal 
(this=<optimized out>, __in_chrg=<optimized out>) at src/logging.cc:2125
   #7  0x0000000000e26694 in doris::DataDir::load (this=0x4d74f00) at 
/builds/olap/doris/be/src/olap/data_dir.cpp:705
   #8  0x0000000000e09dd9 in operator() (__closure=0x5349558) at 
/builds/olap/doris/be/src/olap/storage_engine.cpp:149
   #9  __invoke_impl<void, doris::StorageEngine::load_data_dirs(const 
std::vector<doris::DataDir*>&)::<lambda()> > (__f=...) at 
/usr/include/c++/7.3.0/bits/invoke.h:60
   #10 __invoke<doris::StorageEngine::load_data_dirs(const 
std::vector<doris::DataDir*>&)::<lambda()> > (__fn=...) at 
/usr/include/c++/7.3.0/bits/invoke.h:95
   #11 _M_invoke<0> (this=0x5349558) at /usr/include/c++/7.3.0/thread:234
   #12 operator() (this=0x5349558) at /usr/include/c++/7.3.0/thread:243
   #13 
std::thread::_State_impl<std::thread::_Invoker<std::tuple<doris::StorageEngine::load_data_dirs(const
 std::vector<doris::DataDir*>&)::<lambda()> > > >::_M_run(void) 
(this=0x5349550) at /usr/include/c++/7.3.0/thread:186
   #14 0x00000000026b642f in std::execute_native_thread_routine (__p=0x5349550) 
at ../../../.././libstdc++-v3/src/c++11/thread.cc:83
   #15 0x00007fca7a981dc5 in start_thread () from /lib64/libpthread.so.0
   #16 0x00007fca7ac8d73d in clone () from /lib64/libc.so.6
   ```
   I checked the related log:
   ```
   W1201 14:52:13.408074 183882 tablet_manager.cpp:155] add duplicated tablet. 
force=0, res=-500, tablet_id=5164922, schema_hash=502924845, old_version=2, 
new_version=2, old_time=1606138765, new_time=1599296476, 
old_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd1/data/325/5164922/502924845,
 
new_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd2/data/64/5164922/502924845
   W1201 14:52:13.408120 183882 tablet_manager.cpp:843] fail to add tablet. 
tablet=5164922.502924845.1848811be2b4e08b-4abe001b0545fcb3[res=-500]
   W1201 14:52:13.408583 183882 data_dir.cpp:690] load tablet from header 
failed. status:-500, tablet=5164922.502924845            // !!!critical log
   W1201 14:52:13.409047 183882 alpha_rowset.cpp:327] tablet: 5164930 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   W1201 14:52:13.409586 183882 alpha_rowset.cpp:327] tablet: 5164990 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   W1201 14:52:13.410159 183882 alpha_rowset.cpp:327] tablet: 5165054 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   W1201 14:52:13.410725 183882 alpha_rowset.cpp:327] tablet: 5165078 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   I1201 14:52:13.410773 183882 tablet_manager.cpp:461] begin drop tablet. 
tablet_id=5165078, schema_hash=502924845
   I1201 14:52:13.410786 183882 tablet_manager.cpp:1387] set tablet to shutdown 
state and remove it from memory. tablet_id=5165078, schema_hash=502924845, 
tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd1/data/162/5165078/502924845
   I1201 14:52:13.411496 183882 tablet_meta_manager.cpp:115] save tablet meta , 
key:tabletmeta_5165078_502924845 meta_size=93382
   W1201 14:52:13.411962 183882 tablet_manager.cpp:155] add duplicated tablet. 
force=0, res=0, tablet_id=5165078, schema_hash=502924845, old_version=2, 
new_version=2, old_time=1599296540, new_time=1606161418, 
old_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd1/data/162/5165078/502924845,
 
new_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd2/data/506/5165078/502924845
   W1201 14:52:13.412612 183882 alpha_rowset.cpp:327] tablet: 5165122 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   W1201 14:52:13.413225 183882 alpha_rowset.cpp:327] tablet: 5165158 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   W1201 14:52:13.413820 183882 alpha_rowset.cpp:327] tablet: 5165170 expect 
zone map size is 253, actual num is 4. If this is not the first start after 
upgrade, please pay attention!
   W1201 14:52:15.418694 183882 data_dir.cpp:700] load tablets from header 
failed, loaded tablet: 45330, error tablet: 1, path: 
/home/work/app/doris/c3prc-hadoop-test/be/ssd2
   F1201 14:52:15.418807 183882 data_dir.cpp:705] load tablets encounter 
failure. stop BE process. path: /home/work/app/doris/c3prc-hadoop-test/be/ssd2
   ```
   It says that when load a new tablet in another data dir with the same tablet 
id, it may lead error, and the BE will exit.
   After reading the code:
   
https://github.com/apache/incubator-doris/blob/df1f06e60b1339ef6e2756d0c4cb492cb64986c7/be/src/olap/tablet_manager.cpp#L130-L151
   
   I doubt if there is a bug, data dirs are parallelly loaded by multi threads, 
a later loaded tablet may be older than the previously loaded tablet, we should 
not assume that a later loaded tablet must be newer (judged by version and 
create time).
   
   **Expected behavior**
   When found a older tablet loaded, just skip.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to