[ot][spam][crazy] log: append-only random-access data

Undiscussed Horrific Abuse, One Victim of Many Thu, 07 Jul 2022 13:27:25 -0700

often i try to implement an append-only random-access data structure.

it's a fun puzzle for my confused mind. i never finish it, but today i
thought about it a little bit and i think over the years i might be
able to put together a simplified form and maybe make it exist some
day.


here is where i wrote right now. i started spinning out when I renamed
Simple.Write to Simple.RangedData .

I was in the middle of implementing a leaf iterator, so as to make a
simple way to count the number of data leaves and associate them with
a depth. the approach, as written in the file, is to simply relink to
leaves with excessive depth each flush, so as to maintain O(log n)
random seek time.

i vaguely recall that approach has a major issue, but i might be
remembering wrongly, and it's still quite exciting to be able to spend
some minutes trying.

# block tree structure

# blocks written to are leaves
# depth is log2 leaves

# when a flush is made, all blocks are written, and also enough nodes such that every leaf can be accessed within depth lookups.


# consider we have an existing tree
# with say m flushes, containing n leaves (or m leaves). we'll likely call it n.


# each flush shows which leaves it has

# additionally, with the final flush, each leaf has an existing depth.

# when we reflush, we need to provide a new index for any leaves that become too deep.

# which leaves are too deep?

# we could basically walk them all to find out. this would be a consistent first approach.

class Simple:
    class RangedData:
        def __init__(self, offset, data):
            self.start = offset
            self.data = data
            self.end = self.start + len(self.data)
    class Flush:
        # flush has a list of new leaves, and a list of indexes to old leaves with ranges
        def __init__(self, *writes, prev_flush=None):
            self.prev_flush = prev_flush
            self.data = writes

            # find extents
            start = min((write.start for write in self.data))
            end = max((write.end for write in self.data))
            
            if prev_flush is None:
                self.start = start
                self.end = end
                self.index = []
                return

            self.start = min(start, prev_flush.start)
            self.end = max(end, prev_flush.end)
            self.index = [(prev_flush.start, prev_flush.end, prev_flush)]

            # find leaf count and leaf depths
            offset = start
            while offset < end:
            
        #def lookup(self, offset
        def leaves(self, start = None, end = None):
            offset = self.start
            data_iter = iter(self.data)
            index_iter = iter(self.index)
            next_write = next(data_iter, None)
            next_index = next(index_iter, None)
            while offset < self.end:
                if next_write is not None:
                    assert offset <= next_write.start
                    if offset == next_write.start:
                        yield (0, next_write)
                        continue
                
            
    def __init__(self, latest = None):
        self.latest = latest
        self.pending = []
    def write(self, offset, data):
        self.pending.append((offset, data))
    def flush(self):
        self.latest = self.Flush(*self.pending, prev_flush=self.latest)
        self.pending = []

[ot][spam][crazy] log: append-only random-access data

Reply via email to