tibrewalpratik17 commented on code in PR #14668:
URL: https://github.com/apache/pinot/pull/14668#discussion_r1887991819


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskExecutor.java:
##########
@@ -58,11 +59,13 @@ protected SegmentConversionResult convert(PinotTaskConfig 
pinotTaskConfig, File
     TableConfig tableConfig = getTableConfig(tableNameWithType);
 
     String validDocIdsTypeStr =
-        
configs.getOrDefault(MinionConstants.UpsertCompactionTask.VALID_DOC_IDS_TYPE, 
ValidDocIdsType.SNAPSHOT.name());
+        configs.getOrDefault(UpsertCompactionTask.VALID_DOC_IDS_TYPE, 
ValidDocIdsType.SNAPSHOT.name());
     SegmentMetadataImpl segmentMetadata = new SegmentMetadataImpl(indexDir);
     String originalSegmentCrcFromTaskGenerator = 
configs.get(MinionConstants.ORIGINAL_SEGMENT_CRC_KEY);
     String crcFromDeepStorageSegment = segmentMetadata.getCrc();
-    if 
(!originalSegmentCrcFromTaskGenerator.equals(crcFromDeepStorageSegment)) {
+    boolean ignoreCrcMismatch = 
Boolean.parseBoolean(configs.getOrDefault(UpsertCompactionTask.IGNORE_CRC_MISMATCH_KEY,
+        String.valueOf(UpsertCompactionTask.DEFAULT_IGNORE_CRC_MISMATCH)));
+    if (ignoreCrcMismatch || 
!originalSegmentCrcFromTaskGenerator.equals(crcFromDeepStorageSegment)) {

Review Comment:
   > guess you meant !ignoreCrcMismatch here, otherwise if ignore=true, we'll 
throw exception directly
   
   My bad updated! 
   
   > is there correctness concern if ignoring crc check? IIUC, as generator 
mainly uses validDocIds to decide how many segments for a task to work on, so 
upsert data would still be correct even if we ignore crc checks.
   
   Yes, during task execution, we download the validDocId snapshot for the 
compacting segment from the server that matches the CRC in Zookeeper. Ignoring 
the CRC mismatch should be acceptable in this scenario, but I believe it's 
important to retain this check for long-term correctness.
   
   The ignoreCrcMismatch configuration should only be used when users 
understand the reason for the CRC mismatch and need to unblock the compaction 
task temporarily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to