[ https://issues.apache.org/jira/browse/KUDU-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao Hao updated KUDU-2196: -------------------------- External issue URL: (was: https://docs.google.com/document/d/1mYPhs_dvklDJdemlvDlUD1XY-zoyP_4sw4TXxDtlDxo/edit?usp=sharing) > Failures of concurrent block transactions can corrupt on-disk consistency > ------------------------------------------------------------------------- > > Key: KUDU-2196 > URL: https://issues.apache.org/jira/browse/KUDU-2196 > Project: Kudu > Issue Type: Bug > Components: fs > Affects Versions: 1.5.0 > Reporter: Hao Hao > > Failures of concurrent multiple block transactions can potentially corrupt > the underlying log block container. > Currently, a log block container can be made available to any uncommitted > writers (block transactions) once the written block is finalized, thus > allowing concurrent writers(block transactions) to share the same log block > container. While committing block transactions, the container will be marked > as read-only if encountered any failures to maintain on-disk consistency. > However, this prevention mechanism cannot help when concurrent writers go > into the commitment state at the same time. If one transaction fail, the > other transactions are still in the process of committing without knowing the > container should be read-only now. This could let partial metadata record > persist on disk and follow with full records, especially if the failure is > transient (e.g ENOSPC). Thus, leaving the container in an unrecoverable state. > More detail and proposed solution can be found in the attached doc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)