tibrewalpratik17 commented on code in PR #12602:
URL: https://github.com/apache/pinot/pull/12602#discussion_r1520498218


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -305,6 +305,7 @@ public void deleteSegmentFile() {
 
   private final StreamPartitionMsgOffset _latestStreamOffsetAtStartupTime;
   private final CompletionMode _segmentCompletionMode;
+  private final List<String> _filteredMessageOffsets = new ArrayList<>();

Review Comment:
   > I wouldn't really track the individual offsets as it can easily flood the 
log. 
   
   We are not tracking individual offsets but logging once a minute just like 
consumed events log. 
   
   >  Do you really see the offset info useful?
   
   So we have a usecase where we don't want to ingest an invalid json in our 
table. For that we are planning to use isJson UDF as a filterConfig in #12603. 
Now we will have an alert on the metric `NUMBER_ROWS_FILTERED` and having a log 
to track the offset would really help in debugging faster.
   
   But open to suggestions if we can do it in a better way. Maybe having a 
config in FilterConfig to log / track and we enable it only for tables we 
really need to in case of debugging? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to