ankitsultana opened a new issue, #12399:
URL: https://github.com/apache/pinot/issues/12399

   ### Issue Description
   
   Partial Upsert tables merge the previous version of a record with the latest 
version. We can end up with a scenario where the replicas diverge but end up 
getting committed anyways (both servers keep their local builds).
   
   Once the replicas diverge, that Kafka partition's segments will always be 
different, until some event forces a reconciliation (e.g. if you restart all 
servers; since CRC mismatch will trigger a download from deep-store).
   
   If there's no reconciliation for a while, then the situation can become 
messier because it could be that the other server gets to commit and upload to 
deepstore in a subsequent segment.
   
   Moreover, after a reconciliation, it will give an illusion that the data has 
been consistent since forever, 
   
   ### Discussion
   
   Given the criticality of ensuring consistency across replicas for Partial 
Upsert, should we consider enforcing some checks during commit time itself?
   
   In case of different CRCs across replicas, we could emit a metric, and 
always pick the committer's segment. The controller could ask the replicas to 
discard their local copy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to