WingsGo opened a new issue #4349:
URL: https://github.com/apache/incubator-doris/issues/4349


   **Describe the bug**
   I meet a same situation in #3840 , and I find the follwing stracktrace like 
following, the reason why BE crash is we didn't catch a Exception 
`orc::TimezoneError`.
   
   ```
   4002 terminate called after throwing an instance of 'orc::TimezoneError'
   4003 rc::TimezoneErro 
   4004   what():  Can't open /usr/share/zoneinfo/GMT+08:00
   4005 *** Aborted at 1597225209 (unix time) try "date -d @1597225209" if you 
are using GNU date ***
   4006 PC: @     0x7f9cd9e1a1f7 __GI_raise
   4007 *** SIGABRT (@0x1ac58) received by PID 109656 (TID 0x7f9c0202a700) from 
PID 109656; stack trace: ***
   4008     @     0x7f9cd9e1a270 (unknown)
   4009     @     0x7f9cd9e1a1f7 __GI_raise
   4010     @     0x7f9cd9e1b8e8 __GI_abort
   4011     @          0x2f21645 __gnu_cxx::__verbose_terminate_handler()
   4012     @          0x2e8d706 __cxxabiv1::__terminate()
   4013     @          0x2e8d751 std::terminate()
   4014     @          0x2ed1c6e execute_native_thread_routine
   4015     @     0x7f9cd9bd0e25 start_thread
   4016     @     0x7f9cd9edd34d __clone 
   ```
   
   I go to orc's source code I found that `When writing timestamps, the ORC 
library now records the time zone in the stripe footer`. So in orc's 
`Reader.hh` file we use `RowReaderImpl::next` to get the data from orc, and the 
function is called by us in 
   
   
https://github.com/apache/incubator-doris/blob/d6028863f3e9d8f401f1dea34a119e48fd21c7fe/be/src/exec/orc_scanner.cpp#L163
   
   but the function will call `startNextStripe()` in `RowReaderImpl::next` , in 
`startNextStripe()` function it will judge whether the orc file has 
writerTimezone in stripe footer, the relate code is in Reader.cc , line 829: 
`const Tinezone& writerTimezone = currentStripeFooter.has_writertimezone() ? 
getTimezoneByName(currentStripeFooter.writertimezone()) : localTimezone;`, so, 
if the orc file has_writertimezone(), the function will call 
`getTimezoneByName` internally.
   
   In `getTimezoneByName`, it will call `getTimezoneByFilename`, the function 
will open file in `/usr/share/zoneinfo` to get specify timezone, if not found, 
will Throw a `orc::ParseError`, the error is cause by FileInputStream's 
constructor(In OrcFile.cc, line 51), after catch the `orc::ParserError`,  it 
will throw anothor error, the relate code is in Timezone.cc line 689
   
   ```
   try {
   } catch (ParseError& err) {
       throw TimezoneError(err.what());
   }
   ```
   
   Now, be's crash reason is clear, if BE's machine have no relate zoneinfo 
file, it will throw a `orc::TimezoneError`, it we forget to catch it , be will 
crash , the function call stracktrace is:
   
   RowReader::next() --> RowReaderImpl::next() --> 
RowReaderImpl::startNextStripe() --> Timezone::getTimezoneByName() --> 
Timezone::getTimezoneByFilename() --> readFile() --> orc::readLocalFile()0 --> 
FileInputStream::FileInputStream --> throw a orc::ParseError --> throw a 
orc::TimezoneError
   
   **Expected behavior**
   BE not crash
   
   **Solutions**
   when call `reader->next()`, we should catch the orc::TimezoneError exception 
and return an InternalError to users to avoid be crash, I will add an PR later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to