> 102. When there is a correlated hard failure (e.g., power outage), it's >> possible that an existing commit/abort marker is lost in all replicas. >> This may not be fixed by the transaction coordinator automatically and >> the >> consumer may get stuck on that incomplete transaction forever. Not sure >> what's the best way to address this. Perhaps, one way is to run a tool >> to >> add an abort maker for all pids in all affected partitions. > >
There can be two types of tools, one for diagnosing the issue and another > for fixing the issue. I think having at least a diagnostic tool in the > first version could be helpful. For example, the tool can report things > like which producer id is preventing the LSO from being advanced. That way, > at least the users can try to fix this themselves. > That sounds reasonable. Will add a work item to track this so that such a tool is available in the first version.