ankitsultana opened a new issue, #12399: URL: https://github.com/apache/pinot/issues/12399
### Issue Description Partial Upsert tables merge the previous version of a record with the latest version. We can end up with a scenario where the replicas diverge but end up getting committed anyways (both servers keep their local builds). Once the replicas diverge, that Kafka partition's segments will always be different, until some event forces a reconciliation (e.g. if you restart all servers; since CRC mismatch will trigger a download from deep-store). If there's no reconciliation for a while, then the situation can become messier because it could be that the other server gets to commit and upload to deepstore in a subsequent segment. Moreover, after a reconciliation, it will give an illusion that the data has been consistent since forever, ### Discussion Given the criticality of ensuring consistency across replicas for Partial Upsert, should we consider enforcing some checks during commit time itself? In case of different CRCs across replicas, we could emit a metric, and always pick the committer's segment. The controller could ask the replicas to discard their local copy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
