Thank you for the review, Lokesh and Bharat. I understand that transaction id would be better than timestamp, especially because the computation cost of getting timestamp. In this case, requirement for the sorting of deletion keys has not to be strictly monotonic, but just mild monotonicity, like where clock skews in the range of ours or days would be acceptable. I'll update the doc.
My question is that, is transaction index always available for non-HA cluster? For example, our 1.1.0 cluster is not using HA for OM nor for SCM and we are not planning to upgrade to even single-node Ratis (still using org.apache.hadoop.hdds.scm.pipeline.leader.choose.algorithms.DefaultLeaderChoosePolicy for ozone.scm.pipeline.leader-choose.policy). Bharat, on RepeatedKeyInfo; Yes, in my plan, RepeatedKeyInfo is still needed for data format compatibility and I'm not planning to change proto. Especially, changing proto format will make upgrade & downgrade extremely difficult IMO. I know it doesn't have to be a list any more, but it's just in theory. On Sat, Oct 30, 2021 at 4:45 AM Bharat Viswanadham <bviswanad...@cloudera.com.invalid> wrote: > > Hi Kota, > Thanks for taking up HDDS-5905 and quickly coming up with a design. > > I liked the overall approach, but one thing instead of timestamps, I agree > with Lokesh, we can use transaction index, and also this will make > implementation easy. (As with timestamp, we need to propagate this from the > leader, handle clock skews, and need to handle leader changes. > > And one question, so do we plan to use RepeatedKeyInfo, now with this > change it will be no more list. You are not planning to change proto? > > > Thanks, > Bharat > > > On Thu, Oct 28, 2021 at 11:12 PM Lokesh Jain <lj...@apache.org> wrote: > > > Hey Kota > > > > I really like the proposed approach because it makes sure that blocks are > > deleted in order of key deletion. I would suggest using Ratis transaction > > id as the prefix. I don’t think we will need a random suffix with that > > approach as transaction id would avoid any collisions. Further it avoid the > > cost of generating timestamps. > > > > Thanks > > Lokesh > > > > > On 29-Oct-2021, at 7:52 AM, Kota Uenishi <k...@preferred.jp> wrote: > > > > > > Hi Bharat & devs, > > > > > > I've written up some of my idea to fix HDDS-5905, which is a > > > block-leak issue mentioned by Bharat. It involves some data format > > > change in deletion table, so I want to get broader range of feedback > > > from committers in addition to Bharat. If it looks good to you, I want > > > to start writing up a patch. Please take a look! > > > > > > The proposal: > > https://docs.google.com/document/d/1KeyhiE1i5SqRSgLy-pIOGW9X6mUYb8iYEkEoDAEQD9Q/edit# > > > HDDS-5905: https://issues.apache.org/jira/browse/HDDS-5905 > > > > > > -- > > > -- > > > Kota UENISHI, Engineer > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > > For additional commands, e-mail: dev-h...@ozone.apache.org > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > For additional commands, e-mail: dev-h...@ozone.apache.org > > > > -- -- Kota UENISHI, Engineer --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org For additional commands, e-mail: dev-h...@ozone.apache.org