walterddr opened a new issue #7910:
URL: https://github.com/apache/pinot/issues/7910


   This issue was discussed partially in 
https://github.com/apache/pinot/pull/7896.
   
   Originally the issue we discovered was that there were null stacktrace 
DebugInfo returned from the DebugAPI, (see below). We found out that 2 of the 
place stacktrace were swallowed.
   
   While we address this issue by making the error explicitly thrown. we 
realized that the control flow were not as expected. There are several concerns:
   @sajjad-moradi : If we throw exception in `buildSegmentAndRepalce` without 
correctly capture and change state, then the segment goes to error state in 
external view upon receiving helix transition message consuming -> online. see 
https://github.com/apache/pinot/pull/7896#discussion_r768340557
   @mcvsubbu : status.ERROR should not be used when a segment creation was 
discarded because this another host might've been able to build it correctly. 
see: https://github.com/apache/pinot/pull/7896#discussion_r769141107; and when 
detecting a recoverable error we should not include that information in the 
debug info page in the first place. see: 
https://github.com/apache/pinot/pull/7896#discussion_r769160523
   
   there's another question I notice from myself and @Jackie-Jiang is that the 
segment errors were captured and put in the debug API in almost all methods 
(public/protected/private), can we clean up the logic by creating a rule so 
that:
   only external facing methods should capture and log debug info; and all 
internal methods should only bubble up the exception for external facing 
methods to capture and log with more contextual information?
   
   Please discuss. 
   
   
   
   <details>
     <summary>Click to expand code block!</summary>
   
   ```
   [
     {
       "tableName": "testTable_REALTIME",
       "numSegments": 1,
       "numServers": 1,
       "numBrokers": 1,
       "segmentDebugInfos": [
         {
           "segmentName": "testTable__0__0__20211210T0838Z",
           "serverState": {
             "Server_localhost_8098": {
               "idealState": "CONSUMING",
               "externalView": "CONSUMING",
               "segmentSize": "0 bytes",
               "consumerInfo": {
                 "segmentName": "testTable__0__0__20211210T0838Z",
                 "consumerState": "NOT_CONSUMING",
                 "lastConsumedTimestamp": 1639125751331,
                 "partitionToOffsetMap": {
                   "0": "325618872"
                 }
               },
               "errorInfo": {
                 "timestamp": "2021-12-10 08:42:35 UTC",
                 "errorMessage": "Could not build segment",
                 "stackTrace": null
               }
             }
           }
         }
       ],
       "serverDebugInfos": [],
       "brokerDebugInfos": [
         {
           "brokerName": "Broker_localhost_8099",
           "idealState": "ONLINE",
           "externalView": "ONLINE"
         }
       ],
       "tableSize": {
         "reportedSize": "0 bytes",
         "estimatedSize": "0 bytes"
       },
       "ingestionStatus": {
         "ingestionState": "UNHEALTHY",
         "errorMessage": "Segment: testTable__0__0__20211210T0838Z is not being 
consumed on server: Server_localhost_8098"
       }
     }
   ]
   ```
   </details>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to