walterddr opened a new issue #7910: URL: https://github.com/apache/pinot/issues/7910
This issue was discussed partially in https://github.com/apache/pinot/pull/7896. Originally the issue we discovered was that there were null stacktrace DebugInfo returned from the DebugAPI, (see below). We found out that 2 of the place stacktrace were swallowed. While we address this issue by making the error explicitly thrown. we realized that the control flow were not as expected. There are several concerns: @sajjad-moradi : If we throw exception in `buildSegmentAndRepalce` without correctly capture and change state, then the segment goes to error state in external view upon receiving helix transition message consuming -> online. see https://github.com/apache/pinot/pull/7896#discussion_r768340557 @mcvsubbu : status.ERROR should not be used when a segment creation was discarded because this another host might've been able to build it correctly. see: https://github.com/apache/pinot/pull/7896#discussion_r769141107; and when detecting a recoverable error we should not include that information in the debug info page in the first place. see: https://github.com/apache/pinot/pull/7896#discussion_r769160523 there's another question I notice from myself and @Jackie-Jiang is that the segment errors were captured and put in the debug API in almost all methods (public/protected/private), can we clean up the logic by creating a rule so that: only external facing methods should capture and log debug info; and all internal methods should only bubble up the exception for external facing methods to capture and log with more contextual information? Please discuss. <details> <summary>Click to expand code block!</summary> ``` [ { "tableName": "testTable_REALTIME", "numSegments": 1, "numServers": 1, "numBrokers": 1, "segmentDebugInfos": [ { "segmentName": "testTable__0__0__20211210T0838Z", "serverState": { "Server_localhost_8098": { "idealState": "CONSUMING", "externalView": "CONSUMING", "segmentSize": "0 bytes", "consumerInfo": { "segmentName": "testTable__0__0__20211210T0838Z", "consumerState": "NOT_CONSUMING", "lastConsumedTimestamp": 1639125751331, "partitionToOffsetMap": { "0": "325618872" } }, "errorInfo": { "timestamp": "2021-12-10 08:42:35 UTC", "errorMessage": "Could not build segment", "stackTrace": null } } } } ], "serverDebugInfos": [], "brokerDebugInfos": [ { "brokerName": "Broker_localhost_8099", "idealState": "ONLINE", "externalView": "ONLINE" } ], "tableSize": { "reportedSize": "0 bytes", "estimatedSize": "0 bytes" }, "ingestionStatus": { "ingestionState": "UNHEALTHY", "errorMessage": "Segment: testTable__0__0__20211210T0838Z is not being consumed on server: Server_localhost_8098" } } ] ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
