gortiz commented on issue #15057:
URL: https://github.com/apache/pinot/issues/15057#issuecomment-3150085110

   Hi there! We have improved o11y in MSE during the last months:
   
   > Query failures [without metrics]
   
   We centralized the place where errors are reported in MSE: Now it is done in 
https://github.com/apache/pinot/blob/a09d01faae93e5156884d9429decdd43aa8f5582/pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/MultiStageBrokerRequestHandler.java#L252
   
   > No logs or stats in response metadata that can be used to identify slow 
instances in any stage of executing a query. Or no way of correlating broker 
request IDs to the logs or stats.
   
   A bit of good and a bit of bad here. Originally, MSE stats were only 
collected when queries successfully finish, but we are now able to collect them 
when the query fails. The bad is:
   1. When a query fails, it aborts the execution, so the stats returned only 
contain information of what was executed so far.
   2. The stats are not going to be helpful to find slow instances because they 
are aggregated by stage, not by instance.
   
   > Timeouts are difficult to diagnose without taking an approach such as 
increasing the timeout, rerunning, and then profiling the query but once again 
that does not enable retrospective debugging. All that is available 
retrospectively are a high volume of logs across many instances such as:
   
   This is fixed. Now that the stats are returned along with the error, you 
should be able to see where the time was spent.
   
   The logs are now enriched with the correlation id of the query (called cid) 
and the stage. These properties are added to the slf4j context and are printed 
in the default log4j2.xml provided with Pinot. In case you use your custom 
log4j2.xml, you will need to modify your pattern to include the cid and the 
stage. You can get inspiration from 
https://github.com/apache/pinot/blob/a09d01faae93e5156884d9429decdd43aa8f5582/pinot-tools/src/main/resources/log4j2.xml#L31


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to