On 8/25/21, 5:40 PM, "Kyotaro Horiguchi" <horikyota....@gmail.com> wrote: > At Wed, 25 Aug 2021 18:18:59 +0000, "Bossart, Nathan" <bossa...@amazon.com> > wrote in >> Let's say we have the following situation (F = flush, E = earliest >> registered boundary, and L = latest registered boundary), and let's >> assume that each segment has a cross-segment record that ends in the >> next segment. >> >> F E L >> |-----|-----|-----|-----|-----|-----|-----|-----| >> 1 2 3 4 5 6 7 8 >> >> Then, we write out WAL to disk and create .ready files as needed. If >> we didn't flush beyond the latest registered boundary, the latest >> registered boundary now becomes the earliest boundary. >> >> F E >> |-----|-----|-----|-----|-----|-----|-----|-----| >> 1 2 3 4 5 6 7 8 >> >> At this point, the earliest segment boundary past the flush point is >> before the "earliest" boundary we are tracking. > > We know we have some cross-segment records in the regin [E L] so we > cannot add a .ready file if flush is in the region. I haven't looked > the latest patch (or I may misunderstand the discussion here) but I > think we shouldn't move E before F exceeds previous (or in the first > picture above) L. Things are done that way in my ancient proposal in > [1].
The strategy in place ensures that we track a boundary that doesn't change until the flush position passes it as well as the latest registered boundary. I think it is important that any segment boundary tracking mechanism does at least those two things. I don't see how we could do that if we didn't update E until F passed both E and L. Nathan